INDEX
Explanations
names of prominent individuals in sports and politics
New Auto-Interp
Negative Logits
itself
-0.17
.fm
-0.14
antro
-0.14
anst
-0.13
_FM
-0.13
ëħĦëıĦë³Ħ
-0.13
ruž
-0.13
jednotliv
-0.13
jedn
-0.12
ÑĥнкÑĤ
-0.12
POSITIVE LOGITS
Jr
0.41
who
0.38
who
0.36
whom
0.32
Sr
0.28
—who
0.27
himself
0.26
's
0.25
jr
0.25
whose
0.25
Activations Density 0.735%