INDEX
Explanations
proper nouns related to names of people or places
repeated instances of the token "ne"
New Auto-Interp
Negative Logits
rador
-1.01
hips
-0.87
rament
-0.85
inarily
-0.83
rican
-0.80
enhagen
-0.78
allery
-0.77
IAL
-0.76
orsi
-0.75
redited
-0.74
POSITIVE LOGITS
arest
1.02
zel
0.90
cker
0.86
theless
0.86
gan
0.84
jad
0.84
cks
0.83
phrine
0.83
cht
0.83
verend
0.83
Activations Density 0.020%