INDEX
Explanations
adverbs and words indicating recent or ongoing conditions and actions
New Auto-Interp
Negative Logits
inci
-0.16
.nlm
-0.15
779
-0.14
fty
-0.14
ubi
-0.14
stone
-0.14
Ñĥнк
-0.13
isan
-0.13
olation
-0.13
گراÙĨ
-0.13
POSITIVE LOGITS
lamaz
0.16
whose
0.15
worth
0.15
EEDED
0.14
ifestyles
0.14
лага
0.14
which
0.14
езда
0.14
cui
0.14
ãģ¹
0.13
Activations Density 0.230%