INDEX
Explanations
the word "don't" preceded by 'I'
statements expressing denial or rejection
New Auto-Interp
Negative Logits
newcom
-0.65
misdem
-0.65
princ
-0.63
Mandatory
-0.61
eleph
-0.61
anwhile
-0.61
challeng
-0.60
subur
-0.59
populated
-0.59
enthusi
-0.58
POSITIVE LOGITS
't
1.64
ned
1.11
´
1.05
ÃŃ
0.96
ovan
0.91
gered
0.85
ates
0.85
uts
0.85
nel
0.83
essee
0.83
Activations Density 0.082%