INDEX
Explanations
references to people's names or nicknames
New Auto-Interp
Negative Logits
âĨIJ
-0.16
åĤ
-0.15
dz
-0.14
USR
-0.14
ulously
-0.14
ways
-0.14
818
-0.14
edium
-0.14
ths
-0.13
nech
-0.13
POSITIVE LOGITS
acock
0.28
oria
0.23
aches
0.22
ugeot
0.22
formance
0.21
eling
0.21
anuts
0.21
ACE
0.21
oples
0.20
uliar
0.19
Activations Density 0.011%