INDEX
Explanations
mixed language or translations
New Auto-Interp
Negative Logits
en
1.01
क्शन
0.98
𝐡
0.96
İŞ
0.95
м
0.94
शिका
0.92
l
0.92
暴
0.90
ını
0.90
еру
0.89
POSITIVE LOGITS
Тем
1.29
হইবার
1.25
Josephine
1.23
Biodiversity
1.23
Наи
1.19
Chairperson
1.19
Nir
1.18
буду
1.18
testAvg
1.17
Нико
1.17
Activations Density 0.001%