INDEX
Explanations
following "and" descriptive words
New Auto-Interp
Negative Logits
ر
0.39
edhe
0.38
ו
0.38
in
0.38
你
0.37
8
0.36
י
0.35
ли
0.35
е
0.34
ే
0.33
POSITIVE LOGITS
ﺐ
0.33
ﺘ
0.33
ке
0.33
чення
0.32
at
0.32
sebagainya
0.32
ﺮ
0.31
inciting
0.30
ﺪ
0.30
ಶ್ರ
0.30
Activations Density 1.749%