INDEX
Explanations
mentions of the word "no" followed by various phrases
negation or denial phrases
New Auto-Interp
Negative Logits
nonetheless
-0.68
minster
-0.63
ially
-0.63
lus
-0.59
Cathy
-0.58
nevertheless
-0.58
ATIVE
-0.57
iership
-0.57
turned
-0.56
RED
-0.56
POSITIVE LOGITS
vel
0.99
zzle
0.93
otrop
0.86
obs
0.83
xious
0.83
longer
0.81
vell
0.81
except
0.80
warranties
0.79
isy
0.79
Activations Density 0.055%