INDEX
Negative Logits
of
-0.77
-
-0.60
M
-0.55
.
-0.54
–
-0.52
B
-0.52
action
-0.50
Mat
-0.50
…
-0.50
M
-0.50
POSITIVE LOGITS
Monfieur
1.18
myſelf
1.15
houſe
1.05
itſelf
1.05
pleaſure
1.03
ſever
1.02
Efq
1.02
Anſ
0.97
himſelf
0.96
Theſe
0.96
Activations Density 1.652%