INDEX
Negative Logits
strong
-0.08
Thing
-0.07
explanation
-0.07
manipulation
-0.07
сама
-0.07
better
-0.06
trick
-0.06
záznam
-0.06
yên
-0.06
served
-0.06
POSITIVE LOGITS
الد
0.06
yr
0.06
ไทย
0.06
owed
0.06
Franç
0.06
nant
0.06
.makeText
0.06
кав
0.06
さんは
0.06
Jacksonville
0.06
Activations Density 0.028%