INDEX
Explanations
the word "terrible."
expressions of negative opinions or criticisms
New Auto-Interp
Negative Logits
ership
-0.96
pai
-0.88
ilus
-0.87
irs
-0.85
ovember
-0.78
cript
-0.78
ersen
-0.77
eters
-0.76
aver
-0.76
illance
-0.76
POSITIVE LOGITS
havoc
0.86
nightmares
0.85
sounding
0.83
headache
0.83
awful
0.80
nightmare
0.79
horrible
0.79
adolesc
0.78
smelling
0.75
karma
0.74
Activations Density 0.013%