INDEX
Explanations
contractions with "don't"
instances of the phrase "I don't" expressing negation or refusal
New Auto-Interp
Negative Logits
eleph
-0.96
pione
-0.90
exting
-0.88
Þ
-0.88
enthusi
-0.87
ò
-0.86
aditional
-0.85
practition
-0.84
newcom
-0.84
ñ
-0.82
POSITIVE LOGITS
't
1.67
ned
1.07
´
0.91
ÃŃ
0.81
ovan
0.78
iversity
0.76
fortunately
0.75
Ê
0.75
`
0.74
ates
0.74
Activations Density 0.090%