INDEX
Negative Logits
S
-1.36
myſelf
-1.30
ſelf
-1.26
itſelf
-1.23
T
-1.18
Efq
-1.16
C
-1.16
P
-1.13
themſelves
-1.13
pleaſure
-1.12
POSITIVE LOGITS
’
0.60
the
0.54
'
0.53
-
0.51
.
0.51
,
0.51
↵↵
0.50
(
0.49
/
0.48
_
0.47
Activations Density 0.178%
S
myſelf
ſelf
itſelf
T
Efq
C
P
themſelves
pleaſure
’
the
'
-
.
,
↵↵
(
/
_