INDEX
Explanations
instances of movement or action verbs
New Auto-Interp
Negative Logits
ording
-0.17
änger
-0.14
ield
-0.14
ddf
-0.14
inkel
-0.13
interchange
-0.13
ko
-0.13
luv
-0.13
kommen
-0.13
airo
-0.13
POSITIVE LOGITS
charging
0.24
looking
0.23
bounding
0.21
bol
0.18
tum
0.18
looking
0.18
gal
0.17
Charging
0.17
around
0.17
Looking
0.17
Activations Density 0.135%