INDEX
Explanations
words related to negative emotions like shame and remorse
expressions of shame and remorse
New Auto-Interp
Negative Logits
Trend
-0.85
tailed
-0.77
tails
-0.76
yip
-0.72
anto
-0.66
tail
-0.63
risome
-0.63
papers
-0.62
stakes
-0.61
ties
-0.61
POSITIVE LOGITS
ashamed
0.97
xual
0.95
blush
0.82
cheeks
0.76
remorse
0.75
nikov
0.71
iates
0.68
leeve
0.67
ript
0.67
perture
0.67
Activations Density 0.022%