INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
↵
0.48
ла
0.46
u
0.43
น
0.40
ون
0.39
in
0.39
p
0.37
im
0.36
و
0.36
us
0.35
POSITIVE LOGITS
a
0.62
0.57
is
0.54
at
0.48
an
0.43
of
0.38
to
0.38
the
0.35
{0.34
la
0.34
Activations Density 18.890%