INDEX
Explanations
questions about predicting or envisioning future outcomes
inquiries about potential outcomes or consequences
New Auto-Interp
Negative Logits
ament
-0.70
tesy
-0.67
cius
-0.61
idated
-0.61
ilts
-0.61
arge
-0.61
mens
-0.61
stra
-0.60
inently
-0.60
lication
-0.60
POSITIVE LOGITS
next
0.96
afterward
0.82
AFTER
0.82
NEXT
0.80
uate
0.80
afterwards
0.79
overnight
0.77
when
0.76
onstage
0.75
backstage
0.73
Activations Density 0.051%