INDEX
Explanations
information about decision-making and options
Expressing preference or choice
choosing between options
New Auto-Interp
Negative Logits
marle
-0.61
Amour
-0.50
"@/
-0.46
[++
-0.44
-0.44
sensi
-0.43
readdir
-0.43
\{\\-0.43
tania
-0.43
overexpression
-0.43
POSITIVE LOGITS
preference
1.16
choosing
1.15
choose
1.15
whichever
1.11
Choosing
1.10
preferred
1.10
Choosing
1.08
choose
1.06
chose
1.05
chosen
1.03
Activations Density 0.607%