INDEX
Explanations
proper nouns, particularly names of people and places
New Auto-Interp
Negative Logits
qli
-0.15
.mvp
-0.15
pty
-0.14
âng
-0.14
oland
-0.14
Äĥr
-0.14
辺
-0.14
ovna
-0.14
ropsych
-0.14
spender
-0.13
POSITIVE LOGITS
baugh
0.18
kowski
0.16
iskey
0.15
Fu
0.14
owski
0.14
wine
0.14
alla
0.14
rud
0.13
ise
0.13
wor
0.13
Activations Density 0.310%