INDEX
Explanations
phrases indicating negation or opposition
punctuation and indicators of non-affirmative or negative statements
New Auto-Interp
Negative Logits
zel
-0.74
reet
-0.73
haul
-0.72
ilated
-0.66
amiliar
-0.65
osite
-0.64
eworks
-0.64
nuts
-0.60
redes
-0.60
estinal
-0.60
POSITIVE LOGITS
nor
1.87
anymore
1.51
yet
1.37
Nor
1.11
nor
1.10
unless
0.96
Instead
0.96
unless
0.94
Nonetheless
0.90
whatsoever
0.87
Activations Density 0.940%