INDEX
Explanations
mentions of educational institutions and athletic achievements
New Auto-Interp
Negative Logits
iminal
-0.17
Roose
-0.17
Kata
-0.15
emmel
-0.15
unger
-0.15
itele
-0.14
ahan
-0.14
ullan
-0.14
ays
-0.14
chiá»ģu
-0.14
POSITIVE LOGITS
erek
0.16
anna
0.14
state
0.13
ADE
0.13
ITTER
0.13
itational
0.13
Sweet
0.13
.bid
0.12
337
0.12
USR
0.12
Activations Density 0.033%