INDEX
Explanations
words related to significant choices or actions
references to significant decisions
New Auto-Interp
Negative Logits
vae
-0.77
english
-0.72
havoc
-0.67
amen
-0.67
icas
-0.67
tremend
-0.67
outh
-0.67
uum
-0.66
ighth
-0.65
ingers
-0.65
POSITIVE LOGITS
makers
1.01
jar
0.94
maker
0.86
decision
0.83
making
0.81
ACTIONS
0.80
maker
0.78
decisions
0.77
makers
0.75
lessness
0.71
Activations Density 0.037%