INDEX
Explanations
words related to feelings of embarrassment or guilt
instances and discussions of shame
New Auto-Interp
Negative Logits
ancial
-0.84
enhagen
-0.75
aeda
-0.73
ersive
-0.72
ichick
-0.71
ITNESS
-0.71
natureconservancy
-0.71
thus
-0.71
intendo
-0.70
opez
-0.70
POSITIVE LOGITS
faced
1.33
fully
1.27
Shame
1.04
shame
1.00
face
0.90
ful
0.88
ously
0.87
ashamed
0.85
imaru
0.85
fulness
0.82
Activations Density 0.012%