INDEX
Explanations
actions involving pushing or displacing others
New Auto-Interp
Negative Logits
arl
-0.18
arin
-0.17
arpa
-0.15
eln
-0.15
è³¢
-0.15
desar
-0.14
691
-0.14
gloss
-0.14
882
-0.14
ene
-0.14
POSITIVE LOGITS
into
0.30
into
0.27
Into
0.25
onto
0.24
Into
0.22
_into
0.21
INTO
0.20
onto
0.19
.into
0.17
:UIAlert
0.16
Activations Density 0.064%