INDEX
Negative Logits
invari
-0.09
Invariant
-0.09
invariant
-0.08
_SMS
-0.08
_PROM
-0.08
discriminatory
-0.08
Каз
-0.08
подраз
-0.08
paar
-0.08
pencil
-0.08
POSITIVE LOGITS
welcomed
0.11
apologized
0.11
greeted
0.10
apolog
0.10
greetings
0.10
traer
0.10
歉
0.10
Welcome
0.09
enthusiastic
0.09
welcome
0.09
Activations Density 0.038%