INDEX
Explanations
phrases indicating opposition or resistance
New Auto-Interp
Negative Logits
jména
-0.81
談社
-0.77
houſe
-0.75
ERTY
-0.70
placés
-0.66
âgées
-0.64
собі
-0.63
Txn
-0.63
pensato
-0.62
nemo
-0.61
POSITIVE LOGITS
Against
1.74
Against
1.73
against
1.57
AGAINST
1.52
against
1.49
gegen
1.32
contre
1.18
tegen
1.04
melawan
0.97
SuppressMessage
0.95
Activations Density 0.074%