INDEX
Explanations
phrases indicating opposition or objections
New Auto-Interp
Negative Logits
jména
-0.77
houſe
-0.71
florales
-0.69
placés
-0.66
nemo
-0.65
ERTY
-0.64
собі
-0.62
ſch
-0.62
談社
-0.61
âgées
-0.61
POSITIVE LOGITS
Against
2.22
Against
2.22
against
2.15
against
2.05
AGAINST
1.95
gegen
1.66
contre
1.46
tegen
1.38
melawan
1.26
против
1.23
Activations Density 0.077%