INDEX
Explanations
questions ending with a question mark
questions posed in the text
New Auto-Interp
Negative Logits
yak
-0.66
ishable
-0.66
marsh
-0.66
wilderness
-0.65
lock
-0.64
rio
-0.64
hob
-0.63
worm
-0.63
striped
-0.62
space
-0.62
POSITIVE LOGITS
Surely
1.08
Nope
1.07
Wouldn
1.01
Certainly
0.97
Answer
0.96
����
0.96
Perhaps
0.95
Probably
0.93
Presumably
0.93
Sadly
0.91
Activations Density 0.113%