INDEX
Explanations
names mentioned within text
the repeated mention of names in various contexts
New Auto-Interp
Negative Logits
OPLE
-0.77
UTERS
-0.72
yrinth
-0.71
Constructed
-0.71
irth
-0.70
UGE
-0.69
Returns
-0.66
ETHOD
-0.62
Springs
-0.62
Decre
-0.62
POSITIVE LOGITS
paces
1.64
pace
1.25
paced
1.11
plates
0.99
ames
0.96
hips
0.90
peed
0.90
erver
0.90
aliases
0.88
names
0.87
Activations Density 0.028%