INDEX
Negative Logits
myſelf
-1.55
itſelf
-1.52
Efq
-1.48
ſeveral
-1.38
himſelf
-1.34
leaſt
-1.34
Monfieur
-1.34
houſe
-1.33
pleaſure
-1.33
themſelves
-1.27
POSITIVE LOGITS
and
0.65
&
0.56
0.54
in
0.53
,
0.47
and
0.47
↵
0.44
to
0.44
(
0.43
/
0.43
Activations Density 0.042%