INDEX
Explanations
terms related to decision-making
references to important decisions or choices made in various contexts
New Auto-Interp
Negative Logits
english
-0.81
vae
-0.77
eco
-0.69
ubric
-0.67
ingers
-0.66
amen
-0.63
sung
-0.63
izont
-0.63
ewitness
-0.62
atomic
-0.62
POSITIVE LOGITS
makers
0.97
decision
0.86
maker
0.84
decisions
0.83
ACTIONS
0.83
stance
0.74
maker
0.72
ters
0.69
chose
0.69
jar
0.69
Activations Density 0.033%