INDEX
Negative Logits
It
1.30
is
1.22
it
1.15
This
1.02
They
1.01
That
0.99
on
0.98
un
0.98
are
0.98
of
0.96
POSITIVE LOGITS
ем
1.14
↵
1.12
accueill
0.94
jeopard
0.92
;
0.90
.
0.88
,
0.86
noy
0.85
ون
0.82
hurt
0.80
Activations Density 0.029%
It
is
it
This
They
That
on
un
are
of
ем
↵
accueill
jeopard
;
.
,
noy
ون
hurt