INDEX
Explanations
expressions of disappointment
expressions of disappointment
New Auto-Interp
Negative Logits
ittee
-0.70
running
-0.69
monary
-0.69
skirts
-0.68
xon
-0.67
llular
-0.65
ifa
-0.65
ahu
-0.64
ermanent
-0.62
uto
-0.62
POSITIVE LOGITS
disappoint
0.89
actory
0.81
disappointment
0.80
ments
0.78
fully
0.74
omission
0.73
ingly
0.73
disappointed
0.73
ful
0.72
loser
0.72
Activations Density 0.068%