INDEX
Explanations
questions about hypothetical scenarios or possibilities
speculative questions that begin with "What would."
New Auto-Interp
Negative Logits
noticed
-0.67
haus
-0.66
held
-0.66
chio
-0.65
holm
-0.63
claimed
-0.63
illary
-0.60
trap
-0.60
cro
-0.60
traced
-0.60
POSITIVE LOGITS
you
0.89
YOU
0.81
anyone
0.77
anybody
0.77
be
0.76
motivate
0.75
suffice
0.73
happen
0.73
ĸļ
0.73
acan
0.72
Activations Density 0.041%