INDEX
Explanations
words related to negative emotions, particularly sadness
expressions and themes related to sadness
New Auto-Interp
Negative Logits
ouver
-0.78
byter
-0.68
RAFT
-0.68
cellent
-0.67
FANT
-0.65
authorized
-0.65
orthy
-0.65
iltration
-0.64
fielded
-0.64
guided
-0.64
POSITIVE LOGITS
der
1.25
omas
1.16
istic
1.07
istically
1.02
Sad
0.98
die
0.90
sad
0.86
stal
0.85
mouth
0.84
imaru
0.82
Activations Density 0.014%