INDEX
Explanations
sentences that begin with the word "Sure."
New Auto-Interp
Negative Logits
ERS
-0.18
sWith
-0.17
ers
-0.16
ses
-0.16
robe
-0.16
oggler
-0.15
olls
-0.15
sim
-0.15
orph
-0.15
omm
-0.15
POSITIVE LOGITS
fire
0.47
-fire
0.42
ty
0.38
foot
0.37
-foot
0.34
nder
0.32
ties
0.29
Fire
0.28
-shot
0.28
st
0.28
Activations Density 0.030%