INDEX
Negative Logits
arbeitet
0.88
që
0.84
évő
0.83
こと
0.83
gì
0.82
vais
0.82
cleft
0.82
ҥ
0.82
Believe
0.81
sendiri
0.81
POSITIVE LOGITS
be
0.92
their
0.90
convincing
0.87
their
0.83
motivating
0.81
our
0.79
top
0.77
persuading
0.74
realistic
0.73
Their
0.72
Activations Density 0.034%