INDEX
Negative Logits
to
1.66
ح
1.20
ین
1.14
ے
1.10
(
1.09
ight
1.08
0
1.03
𝟬
1.03
기
1.02
for
0.98
POSITIVE LOGITS
a
1.74
.
1.41
s
1.39
n
1.36
d
1.36
u
1.22
h
1.15
ের
1.13
)
1.12
i
1.11
Activations Density 0.003%
to
ح
ین
ے
(
ight
0
𝟬
기
for
a
.
s
n
d
u
h
ের
)
i