INDEX
Explanations
words related to negative emotions, particularly sadness
expressions of sadness or sorrow
New Auto-Interp
Negative Logits
ouver
-0.76
entials
-0.71
authorized
-0.68
RAFT
-0.65
IBLE
-0.63
Ranked
-0.63
iltration
-0.63
VERTISEMENT
-0.62
vernment
-0.61
guided
-0.61
POSITIVE LOGITS
der
1.28
omas
1.25
istic
1.13
istically
1.12
Sad
0.93
die
0.91
stal
0.85
imaru
0.85
mouth
0.84
sad
0.81
Activations Density 0.020%