INDEX
Explanations
statements related to strong negative emotions, particularly anger
expressions of anger
New Auto-Interp
Negative Logits
artifacts
-0.70
livest
-0.70
Places
-0.69
arent
-0.68
ums
-0.67
glas
-0.67
issue
-0.67
agate
-0.66
sites
-0.66
Vide
-0.65
POSITIVE LOGITS
anger
0.91
ingly
0.89
fulness
0.87
indignation
0.86
ful
0.78
resentment
0.77
fury
0.72
rage
0.71
outburst
0.69
wart
0.69
Activations Density 0.013%