INDEX
Explanations
words related to strong negative emotions, specifically disgust and horror
expressions of strong negative emotions, particularly disgust and horror
New Auto-Interp
Negative Logits
arta
-0.81
ingham
-0.75
ieth
-0.75
ept
-0.68
etheus
-0.67
uin
-0.66
pec
-0.65
pler
-0.65
ilt
-0.64
arial
-0.63
POSITIVE LOGITS
Zucker
0.85
ĸļ
0.80
ptin
0.73
disgusted
0.70
ingly
0.68
disgust
0.68
Viz
0.68
fur
0.67
lihood
0.67
::::::::
0.65
Activations Density 0.041%