INDEX
Explanations
) followed by newline or separator
New Auto-Interp
Negative Logits
на
1.73
و
1.69
ان
1.52
ल
1.52
en
1.44
ار
1.26
وها
1.24
та
1.21
is
1.20
k
1.17
POSITIVE LOGITS
써
0.97
ς
0.94
mengatur
0.87
kalangan
0.87
mẽ
0.86
佚
0.81
Selle
0.80
kelamin
0.80
pełni
0.78
integrantes
0.77
Activations Density 0.601%