INDEX
Explanations
foreign languages or scripts
New Auto-Interp
Negative Logits
komple
0.55
Nok
0.53
Sund
0.52
vores
0.51
sund
0.50
poro
0.50
intolerable
0.49
oC
0.49
Dris
0.49
Vort
0.48
POSITIVE LOGITS
وا
0.73
У
0.71
ک
0.68
ע
0.67
گ
0.66
इ
0.65
ع
0.64
А
0.64
?
0.64
من
0.63
Activations Density 0.000%