INDEX
Negative Logits
da
-0.73
ser
-0.66
~
-0.59
de
-0.58
no
-0.57
IRUS
-0.56
por
-0.55
van
-0.55
the
-0.55
f
-0.54
POSITIVE LOGITS
purpoſe
1.37
Efq
1.34
Jefus
1.34
itſelf
1.30
pleaſure
1.30
Diſ
1.30
Monfieur
1.30
myſelf
1.29
Reſ
1.25
whoſe
1.23
Activations Density 0.134%