INDEX
Explanations
the word "imagine."
phrases prompting hypothetical scenarios or thought experiments
New Auto-Interp
Negative Logits
bard
-0.87
ged
-0.75
inals
-0.75
die
-0.73
fund
-0.68
hide
-0.66
woods
-0.66
unker
-0.65
inance
-0.63
wa
-0.63
POSITIVE LOGITS
ĸļ
0.92
lihood
0.83
how
0.74
ufact
0.73
imagine
0.72
msec
0.70
aloud
0.69
eers
0.68
orial
0.67
imagining
0.65
Activations Density 0.027%