INDEX
Explanations
phrases indicating movement or direction
New Auto-Interp
Negative Logits
ingo
-0.17
ulen
-0.17
agen
-0.15
edic
-0.15
mic
-0.14
ingly
-0.14
gil
-0.14
inge
-0.14
pcf
-0.14
пи
-0.14
POSITIVE LOGITS
toward
0.16
towards
0.16
into
0.16
418
0.15
onto
0.15
the
0.15
840
0.14
iore
0.14
izabeth
0.14
into
0.14
Activations Density 0.119%