INDEX
Explanations
proper names, particularly those related to educational and governmental figures
New Auto-Interp
Negative Logits
seau
-0.18
avery
-0.17
zew
-0.16
ardon
-0.15
oku
-0.15
uya
-0.15
lider
-0.15
attle
-0.14
inski
-0.14
Äĥ
-0.14
POSITIVE LOGITS
uni
0.15
оÑĢоÑĤ
0.15
zk
0.14
wayne
0.14
eeper
0.14
Roma
0.14
Den
0.14
objc
0.14
Mn
0.13
207
0.13
Activations Density 0.081%