INDEX
Explanations
phrases related to emotions and opinions
negative contractions, particularly related to unwillingness or refusal
New Auto-Interp
Negative Logits
mixed
-0.62
Windsor
-0.62
SERV
-0.62
soph
-0.60
contrasted
-0.60
Friends
-0.59
fused
-0.59
shack
-0.58
blurred
-0.58
partners
-0.57
POSITIVE LOGITS
t
1.09
nt
0.94
else
0.92
¹
0.90
ivably
0.87
ttle
0.87
erest
0.87
onna
0.85
swer
0.83
be
0.83
Activations Density 0.078%