INDEX
Explanations
phrases indicating a choice or action contrary to an expected or suggested course of action
phrases that indicate contrasts or alternatives
New Auto-Interp
Negative Logits
meric
-0.80
fifth
-0.71
ancer
-0.69
izer
-0.69
lass
-0.68
most
-0.65
third
-0.65
iple
-0.65
read
-0.65
mma
-0.64
POSITIVE LOGITS
focusing
1.12
relying
1.10
wasting
1.06
letting
1.04
blaming
0.98
being
0.97
acknowledging
0.97
rever
0.96
apologizing
0.95
having
0.94
Activations Density 0.050%