INDEX
Explanations
the word "too" used in various contexts
New Auto-Interp
Negative Logits
nÃło
-0.16
pu
-0.15
mp
-0.15
brook
-0.15
course
-0.15
kar
-0.15
kker
-0.15
ullo
-0.15
walk
-0.14
ogne
-0.14
POSITIVE LOGITS
gether
0.22
/from
0.20
led
0.18
thers
0.18
ledo
0.17
kest
0.17
o
0.17
oot
0.15
eko
0.15
ichni
0.15
Activations Density 0.028%