INDEX
Explanations
propper nouns
the repeated occurrence of the substring "ne" within words
New Auto-Interp
Negative Logits
rador
-0.96
hips
-0.83
rament
-0.80
inarily
-0.79
allery
-0.77
glim
-0.76
orsi
-0.76
rican
-0.74
enhagen
-0.74
ENCY
-0.73
POSITIVE LOGITS
arest
0.97
cht
0.96
jad
0.94
phrine
0.90
verend
0.90
zel
0.89
gan
0.89
cks
0.88
gger
0.86
zi
0.86
Activations Density 0.022%