INDEX
Negative Logits
App
0.56
Comm
0.55
Hi
0.55
Option
0.54
Bonjour
0.53
Person
0.52
Oper
0.51
Associate
0.51
Or
0.50
Emma
0.50
POSITIVE LOGITS
trình
0.55
pointless
0.53
thiết
0.50
ی
0.49
про
0.49
lợi
0.49
lawsuit
0.49
दावे
0.49
ranked
0.48
susceptibility
0.48
Activations Density 0.000%