INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
as
1.28
ni
1.17
at
1.16
al
1.14
ar
1.07
d
1.05
en
1.03
et
1.03
l
1.03
و
1.02
POSITIVE LOGITS
ן
1.18
ς
1.16
માં
1.13
ین
1.07
کتاب
1.07
۰
1.07
aument
1.03
ک
1.02
яка
1.02
те
0.98
Activations Density 0.000%