INDEX
Explanations
contractions of "do not", especially with high importance on instances where the contraction "don't" is used
negations or forms of the word "don't."
New Auto-Interp
Negative Logits
reluct
-0.97
newcom
-0.95
exha
-0.93
enthusi
-0.93
Þ
-0.91
pione
-0.91
aditional
-0.88
challeng
-0.88
princ
-0.86
conclud
-0.85
POSITIVE LOGITS
't
1.62
ned
1.18
ning
1.03
ates
0.93
uts
0.92
keys
0.84
ate
0.84
´
0.83
\'
0.82
ners
0.81
Activations Density 0.113%