INDEX
Explanations
discourse markers and structural phrases
New Auto-Interp
Negative Logits
س
0.53
ھی
0.46
RESOL
0.46
CIVIL
0.46
Numeral
0.44
Coordin
0.43
POLICY
0.43
هی
0.43
င
0.42
ጄ
0.42
POSITIVE LOGITS
博文
0.51
ensburg
0.46
apache
0.45
blast
0.45
net
0.44
linker
0.44
sphere
0.43
enige
0.41
abuse
0.41
msub
0.40
Activations Density 0.002%