INDEX
Explanations
phrases indicating movement or direction
New Auto-Interp
Negative Logits
p
-0.15
Mart
-0.14
_MA
-0.14
ast
-0.13
à¸Ķย
-0.13
depr
-0.13
sey
-0.13
Mon
-0.13
pl
-0.13
ats
-0.13
POSITIVE LOGITS
ucker
0.16
vu
0.15
beck
0.15
Trick
0.15
criptor
0.14
ubat
0.14
ç´
0.14
lessly
0.14
tür
0.14
大åħ¨
0.14
Activations Density 0.137%