INDEX
Explanations
instances of the word "upset," indicating emotions of dissatisfaction or distress
New Auto-Interp
Negative Logits
glas
-0.85
livest
-0.81
ographies
-0.75
audi
-0.70
istered
-0.70
gravity
-0.69
haar
-0.69
perty
-0.69
estones
-0.68
atures
-0.67
POSITIVE LOGITS
dy
0.86
der
0.80
stomach
0.75
ingly
0.74
quished
0.72
upset
0.72
NESS
0.70
bur
0.70
roy
0.69
ful
0.67
Activations Density 0.008%