INDEX
Negative Logits
1
-2.69
7
-2.59
2
-2.52
6
-2.42
5
-2.38
4
-2.38
3
-2.31
8
-2.22
and
-2.17
0
-2.08
POSITIVE LOGITS
a
3.25
not
2.94
now
2.52
also
2.11
Which
2.05
an
2.03
usually
1.98
interpreta
1.98
morfo
1.94
just
1.93
Activations Density 0.114%