INDEX
Explanations
specific decisions or actions being taken regarding various subjects
instances of the word "decision."
New Auto-Interp
Negative Logits
vae
-0.78
english
-0.72
eco
-0.71
ingers
-0.69
sung
-0.69
ubric
-0.68
amen
-0.67
icas
-0.66
nat
-0.66
ogene
-0.65
POSITIVE LOGITS
makers
0.99
ACTIONS
0.86
maker
0.85
decisions
0.82
decision
0.79
jar
0.79
Garc
0.73
regarding
0.72
maker
0.70
crop
0.70
Activations Density 0.035%