INDEX
Explanations
the word "don't."
negative contractions, particularly "don't"
New Auto-Interp
Negative Logits
afore
-0.67
vanquished
-0.62
Species
-0.59
ejected
-0.59
VERS
-0.58
ipel
-0.57
Calls
-0.57
elimination
-0.57
Completed
-0.57
Casting
-0.56
POSITIVE LOGITS
't
1.57
ned
1.23
ates
0.98
ning
0.93
atives
0.87
uts
0.86
nell
0.84
ate
0.84
nels
0.83
etsk
0.83
Activations Density 0.132%