INDEX
Explanations
phrases related to movement or direction
New Auto-Interp
Negative Logits
миÑĤ
-0.15
irr
-0.15
mrt
-0.15
ledge
-0.15
asa
-0.15
agan
-0.15
ÙIJÙħ
-0.14
alm
-0.14
iterate
-0.14
Dumpster
-0.13
POSITIVE LOGITS
/down
0.19
towards
0.18
STACK
0.16
river
0.15
toward
0.15
into
0.15
wards
0.15
ardy
0.14
empo
0.14
ières
0.14
Activations Density 0.051%