INDEX
Negative Logits
-away
-0.08
stirred
-0.07
willing
-0.07
_down
-0.06
misunderstanding
-0.06
likely
-0.06
pull
-0.06
able
-0.06
puzzles
-0.06
jump
-0.06
POSITIVE LOGITS
excessive
0.15
cessive
0.13
excessively
0.10
aşırı
0.07
=:
0.07
dr
0.07
overly
0.07
ie
0.07
щи
0.07
IC
0.07
Activations Density 0.006%