INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
1.01
are
0.97
be
0.89
is
0.82
ется
0.82
an
0.73
it
0.70
{0.68
as
0.68
۔
0.67
POSITIVE LOGITS
an
1.25
u
1.16
in
1.09
a
1.09
z
1.05
p
1.02
on
0.96
et
0.96
x
0.96
k
0.96
Activations Density 0.000%