INDEX
Explanations
occurrences of the word "to" indicating intention or direction
New Auto-Interp
Negative Logits
kil
-0.16
stead
-0.15
/we
-0.14
ragen
-0.14
fillType
-0.14
abilia
-0.14
sticks
-0.13
mium
-0.13
midt
-0.13
zelf
-0.13
POSITIVE LOGITS
gether
0.28
/from
0.26
plevel
0.25
ogle
0.20
ledo
0.19
/about
0.19
pline
0.17
asting
0.16
iling
0.16
tem
0.16
Activations Density 0.892%