INDEX
Explanations
questions or statements expressing curiosity about future outcomes
questions and phrases expressing curiosity about future events or outcomes
New Auto-Interp
Negative Logits
arest
-0.68
GV
-0.66
WATCHED
-0.65
ansom
-0.65
cautioned
-0.63
question
-0.63
ashington
-0.63
guiActiveUn
-0.62
Nonetheless
-0.61
stated
-0.61
POSITIVE LOGITS
happens
1.02
unfolds
0.85
develops
0.83
shenanigans
0.83
pops
0.81
others
0.74
pans
0.73
else
0.73
reactions
0.70
happen
0.70
Activations Density 0.071%