INDEX
Explanations
measurable attributes and states
New Auto-Interp
Negative Logits
ک
0.48
לא
0.48
し
0.43
لا
0.43
نا
0.39
بال
0.37
ます
0.36
ニ
0.36
۔
0.35
א
0.35
POSITIVE LOGITS
at
0.47
am
0.47
ak
0.47
on
0.44
i
0.44
trustee
0.41
el
0.40
ig
0.39
and
0.39
Ryanair
0.38
Activations Density 1.832%