INDEX
Explanations
phrases indicating opposition or disagreement
New Auto-Interp
Negative Logits
jména
-0.73
houſe
-0.69
nemo
-0.67
ERTY
-0.66
florales
-0.65
Txn
-0.65
談社
-0.64
Wren
-0.64
placés
-0.61
собі
-0.60
POSITIVE LOGITS
Against
2.03
Against
2.02
against
1.91
against
1.84
AGAINST
1.80
gegen
1.48
contre
1.35
tegen
1.24
melawan
1.14
против
1.12
Activations Density 0.062%