INDEX
Explanations
questions within sentences
rhetorical questions
New Auto-Interp
Negative Logits
apan
-0.68
yak
-0.68
potion
-0.66
docks
-0.66
dex
-0.65
lock
-0.64
corrid
-0.64
ishable
-0.64
arm
-0.63
hob
-0.62
POSITIVE LOGITS
Surely
1.08
Nope
1.08
Wouldn
1.03
����
0.97
Certainly
0.97
Answer
0.94
Perhaps
0.93
Probably
0.93
Presumably
0.93
Answer
0.92
Activations Density 0.116%