INDEX
Explanations
phrases indicating decisions or choices to be made
New Auto-Interp
Negative Logits
oil
-0.78
atha
-0.77
acements
-0.72
tremend
-0.70
athi
-0.68
atched
-0.68
ikan
-0.68
oing
-0.66
ffff
-0.65
outh
-0.64
POSITIVE LOGITS
whether
1.00
decide
0.75
decisions
0.74
deciding
0.72
unanimously
0.72
how
0.70
differently
0.69
decisively
0.69
decides
0.69
upon
0.69
Activations Density 0.033%