INDEX
Explanations
positively evaluated actions or decisions
phrases indicating sound decision-making or judgments
New Auto-Interp
Negative Logits
oreal
-0.72
ļéĨĴ
-0.70
anguage
-0.68
stories
-0.68
icone
-0.68
aura
-0.68
resy
-0.67
ench
-0.66
ampions
-0.66
ringe
-0.65
POSITIVE LOGITS
decisions
1.11
decision
1.10
timing
1.09
execution
1.05
prudent
1.04
manoeuv
0.99
omission
0.99
deliberate
0.96
choices
0.94
executed
0.92
Activations Density 0.687%