INDEX
Explanations
instances of the word "too."
New Auto-Interp
Negative Logits
mp
-0.19
pu
-0.19
nÃło
-0.18
ry
-0.18
ron
-0.17
st
-0.16
ford
-0.16
wood
-0.16
toch
-0.16
w
-0.15
POSITIVE LOGITS
led
0.26
ledo
0.24
gether
0.24
/from
0.20
thers
0.19
o
0.18
kest
0.17
oooooooo
0.17
xygen
0.16
pez
0.16
Activations Density 0.037%