INDEX
Explanations
phrases indicating possession or belonging
New Auto-Interp
Negative Logits
archical
-0.16
женÑĮ
-0.15
/fw
-0.15
à¥įदर
-0.15
äd
-0.15
kul
-0.14
ä½³
-0.13
recur
-0.13
ÅĻev
-0.13
oras
-0.13
POSITIVE LOGITS
ones
0.16
isson
0.15
(<?
0.14
ison
0.14
distance
0.14
roller
0.14
aset
0.13
ãĥ¬ãĥĥãĥĪ
0.13
vak
0.13
aged
0.13
Activations Density 0.035%