INDEX
Explanations
questions or rhetorical queries
New Auto-Interp
Negative Logits
yak
-0.66
ishable
-0.65
striped
-0.64
apan
-0.64
lock
-0.63
soap
-0.63
foam
-0.62
worm
-0.62
apter
-0.62
docks
-0.62
POSITIVE LOGITS
Surely
1.03
Wouldn
1.02
Nope
0.99
Certainly
0.96
����
0.94
Conversely
0.92
Presumably
0.92
Perhaps
0.91
Probably
0.90
Why
0.89
Activations Density 0.110%