INDEX
Explanations
the mention of names, specifically first or last names
New Auto-Interp
Negative Logits
i
-0.22
и
-0.21
ÛĮ
-0.18
ing
-0.17
iens
-0.17
iem
-0.17
vap
-0.17
aes
-0.17
cname
-0.16
iÄĻ
-0.16
POSITIVE LOGITS
itionally
0.21
dest
0.20
missible
0.19
pole
0.19
ron
0.19
venture
0.18
uate
0.18
ulent
0.17
elf
0.17
de
0.16
Activations Density 0.082%