INDEX
Negative Logits
f
0.45
that
0.43
↵
0.43
ა
0.39
ী
0.39
ת
0.39
d
0.38
ل
0.38
z
0.37
ुभ
0.36
POSITIVE LOGITS
be
0.66
0.59
\
0.53
\
0.49
e
0.48
on
0.48
of
0.47
to
0.46
<
0.44
{0.43
Activations Density 0.397%
f
that
↵
ა
ী
ת
d
ل
z
ुभ
be
\
\
e
on
of
to
<
{