INDEX
Explanations
verbs and phrases indicating movement or departure
New Auto-Interp
Negative Logits
ibi
-0.15
iston
-0.15
758
-0.15
016
-0.15
rang
-0.15
ono
-0.15
ullet
-0.14
istrat
-0.14
bject
-0.14
utz
-0.14
POSITIVE LOGITS
heading
0.31
headed
0.30
toward
0.29
towards
0.28
bound
0.26
heading
0.24
destination
0.23
Heading
0.23
Heading
0.23
åīįå¾Ģ
0.22
Activations Density 0.079%