INDEX
Explanations
actions related to decision-making
instances of the phrase "choose to" indicating decision-making
New Auto-Interp
Negative Logits
til
-0.77
Canary
-0.69
listed
-0.68
ILY
-0.68
reperto
-0.64
inguished
-0.62
ritical
-0.62
pins
-0.62
manuscripts
-0.61
Five
-0.61
POSITIVE LOGITS
pursue
1.08
ignore
0.99
make
0.99
explore
0.97
give
0.97
maximize
0.96
prioritize
0.96
minimize
0.96
create
0.94
marry
0.94
Activations Density 0.067%