INDEX
Explanations
words related to negative emotions like shame, guilt, pity, anger, and judgment
expressions of shame and related emotions
New Auto-Interp
Negative Logits
enhagen
-0.82
ersive
-0.79
ichick
-0.73
agall
-0.73
ancial
-0.72
aeda
-0.70
IFE
-0.69
irements
-0.68
agnetic
-0.67
ighters
-0.65
POSITIVE LOGITS
faced
1.33
fully
1.30
face
1.00
fulness
0.99
imaru
0.97
ously
0.93
ful
0.93
Shame
0.90
shame
0.90
judgement
0.82
Activations Density 0.044%