INDEX
Explanations
words related to decision-making processes
decision making process
New Auto-Interp
Negative Logits
the
-0.66
The
-0.57
*
-0.46
femininas
-0.46
<bos>
-0.45
wide
-0.44
crazy
-0.44
ys
-0.43
five
-0.42
Is
-0.41
POSITIVE LOGITS
Decision
1.11
decision
1.06
decision
1.04
Decision
1.03
DECISION
0.94
Decisions
0.91
decisions
0.86
DECISION
0.86
Decisions
0.83
decisions
0.80
Activations Density 0.013%