INDEX
Negative Logits
I
0.91
v
0.83
sh
0.78
the
0.76
ib
0.76
.
0.74
ir
0.73
rit
0.73
ő
0.73
ali
0.72
POSITIVE LOGITS
was
1.07
were
0.98
has
0.88
to
0.84
heeft
0.82
had
0.82
ות
0.82
is
0.78
hadden
0.77
(
0.75
Activations Density 0.005%
I
v
sh
the
ib
.
ir
rit
ő
ali
was
were
has
to
heeft
had
ות
is
hadden
(