INDEX
Explanations
actions that imply physical interaction or violence
New Auto-Interp
Negative Logits
jÃŃt
-0.15
itten
-0.15
ijd
-0.14
abela
-0.14
mind
-0.14
izzo
-0.13
vfs
-0.13
ILLA
-0.13
iom
-0.13
okino
-0.13
POSITIVE LOGITS
Earn
0.17
lege
0.15
æĴĥ
0.14
clave
0.14
ients
0.14
ekim
0.14
awe
0.14
udas
0.14
353
0.14
zahl
0.14
Activations Density 0.187%