INDEX
Explanations
terms related to making important choices or judgements
mentions of significant decisions or choices
New Auto-Interp
Negative Logits
vae
-0.74
ingers
-0.70
rake
-0.68
hemat
-0.66
outh
-0.66
rawling
-0.64
ammers
-0.64
Dak
-0.62
amen
-0.62
Smile
-0.62
POSITIVE LOGITS
decision
1.09
decisions
1.03
makers
0.83
stance
0.80
DragonMagazine
0.79
warr
0.74
Decision
0.73
calculus
0.72
choices
0.71
decides
0.71
Activations Density 0.032%