INDEX
Explanations
names of historical or notable individuals
New Auto-Interp
Negative Logits
Fat
-0.17
kola
-0.16
lington
-0.16
ATO
-0.15
egin
-0.14
fat
-0.14
angelo
-0.14
neh
-0.14
ato
-0.14
elas
-0.14
POSITIVE LOGITS
_AUX
0.16
sonian
0.16
pg
0.15
craft
0.15
voks
0.14
Horny
0.14
çŃĴ
0.14
ãĥ¼ãĥ«
0.14
(iOS
0.13
riel
0.13
Activations Density 0.140%