INDEX
Explanations
phrases indicating opposition or confrontation
New Auto-Interp
Negative Logits
jména
-0.73
ERTY
-0.71
houſe
-0.69
nemo
-0.68
談社
-0.64
bezeichneter
-0.64
florales
-0.63
Txn
-0.62
собі
-0.61
placés
-0.61
POSITIVE LOGITS
Against
1.92
Against
1.91
against
1.80
against
1.73
AGAINST
1.72
gegen
1.39
contre
1.30
tegen
1.23
melawan
1.08
против
1.05
Activations Density 0.087%