INDEX
Negative Logits
m
1.09
was
1.09
are
1.08
are
1.04
was
1.03
it
1.02
p
0.98
،
0.96
ד
0.94
methyl
0.93
POSITIVE LOGITS
3
1.30
4
1.11
8
1.10
5
1.09
9
1.09
7
1.08
6
1.03
grumpy
0.97
З
0.97
恼
0.96
Activations Density 0.065%
m
was
are
are
was
it
p
،
ד
methyl
3
4
8
5
9
7
6
grumpy
З
恼