INDEX
Explanations
contractions of specific verbs and pronouns
negations or words indicating inability or denial
New Auto-Interp
Negative Logits
eleph
-0.92
anwhile
-0.87
pione
-0.85
newcom
-0.80
senal
-0.79
ccording
-0.75
aditional
-0.74
Þ
-0.74
conclud
-0.73
enthusi
-0.73
POSITIVE LOGITS
't
1.58
ned
0.91
ÃŃ
0.90
Õ
0.90
´
0.78
eness
0.78
uts
0.76
nen
0.74
ny
0.74
ovan
0.74
Activations Density 0.155%