INDEX
Explanations
phrases related to negative emotions, particularly shame
references to the concept of shame and associated emotional expressions
New Auto-Interp
Negative Logits
ersive
-0.86
enhagen
-0.84
ancial
-0.78
ichick
-0.73
IFE
-0.73
aeda
-0.73
agnetic
-0.72
agall
-0.67
ullivan
-0.67
ITNESS
-0.66
POSITIVE LOGITS
faced
1.34
fully
1.30
face
0.97
ously
0.91
ful
0.91
fulness
0.91
shame
0.89
imaru
0.86
judgement
0.85
Shame
0.84
Activations Density 0.029%