INDEX
Explanations
actions associated with movement or departure
New Auto-Interp
Negative Logits
ahn
-0.16
rong
-0.16
itre
-0.15
تÙĤÙĪ
-0.14
nl
-0.14
738
-0.14
758
-0.14
nown
-0.14
glas
-0.14
dos
-0.13
POSITIVE LOGITS
leaving
0.30
Leaving
0.25
leave
0.22
Leave
0.20
leaves
0.20
toward
0.20
Leave
0.20
headed
0.19
into
0.19
direction
0.19
Activations Density 0.124%