INDEX
Explanations
with confidence or examples
New Auto-Interp
Negative Logits
ка
1.07
in
0.96
ara
0.96
reservados
0.91
써
0.86
ని
0.85
ik
0.82
ien
0.80
栯
0.80
تاريخ
0.79
POSITIVE LOGITS
with
1.59
y
1.55
は
1.24
t
1.23
ב
1.20
ת
1.19
ی
1.18
l
1.17
ED
1.14
ם
1.13
Activations Density 0.436%