INDEX
Explanations
processing provided information
New Auto-Interp
Negative Logits
en
0.86
ق
0.84
ad
0.84
in
0.78
ar
0.75
ic
0.71
al
0.71
ap
0.71
u
0.70
et
0.66
POSITIVE LOGITS
х
0.67
ИН
0.65
ERS
0.63
き
0.62
อย่าง
0.61
immobil
0.60
в
0.59
ה
0.59
なく
0.58
ال
0.58
Activations Density 1.082%