INDEX
Explanations
expressions of disappointment
expressions of disappointment
New Auto-Interp
Negative Logits
ilic
-0.72
ilian
-0.72
ittee
-0.71
alach
-0.70
hens
-0.70
skirts
-0.70
running
-0.69
ossession
-0.67
ioch
-0.67
livest
-0.64
POSITIVE LOGITS
actory
0.86
disappoint
0.84
disappointment
0.83
imaru
0.80
disappointed
0.75
loser
0.72
ments
0.71
ingly
0.67
losers
0.66
fully
0.63
Activations Density 0.047%