INDEX
Explanations
words related to making decisions or choices
instances of the word "choose" and its variations in the context of decision-making
New Auto-Interp
Negative Logits
brance
-0.85
itamin
-0.68
ptoms
-0.65
forcing
-0.64
è¦ļéĨĴ
-0.64
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
-0.63
mons
-0.62
otor
-0.62
ONY
-0.60
dfx
-0.58
POSITIVE LOGITS
wisely
1.04
to
0.86
not
0.71
instead
0.70
sides
0.69
chosen
0.66
chose
0.66
whichever
0.64
randomly
0.64
eters
0.63
Activations Density 0.044%