INDEX
Negative Logits
er
-0.60
y
-0.58
(
-0.56
.
-0.56
p
-0.51
(
-0.50
l
-0.48
-
-0.47
m
-0.47
n
-0.46
POSITIVE LOGITS
itſelf
1.61
myſelf
1.58
Efq
1.55
pleaſure
1.54
themſelves
1.52
Anſ
1.51
Houſe
1.49
Theſe
1.49
Reſ
1.48
reaſon
1.47
Activations Density 0.245%