INDEX
Explanations
phrases suggesting hypothetical situations with specific actions or consequences
conditional statements and hypothetical scenarios
New Auto-Interp
Negative Logits
TION
-0.66
ahead
-0.61
Enough
-0.59
Cook
-0.58
Brill
-0.58
Ready
-0.58
sis
-0.57
Ahead
-0.57
Tok
-0.57
contained
-0.57
POSITIVE LOGITS
expect
1.31
imagine
1.29
suppose
1.19
assume
1.13
presume
1.12
wonder
1.03
agine
0.99
speculate
0.99
think
0.97
argue
0.96
Activations Density 0.072%