INDEX
Negative Logits
Take
-0.85
(
-0.75
,
-0.73
It
-0.65
A
-0.64
[
-0.61
No
-0.61
In
-0.60
Ter
-0.60
for
-0.59
POSITIVE LOGITS
Eſ
1.27
Reſ
1.23
ſeveral
1.20
Conſ
1.19
Diſ
1.17
ſelf
1.16
Efq
1.16
iſt
1.14
Monfieur
1.14
ſelves
1.13
Activations Density 0.077%