INDEX
Explanations
references to physical pain or injury
New Auto-Interp
Negative Logits
myſelf
-1.87
pleaſure
-1.85
Efq
-1.80
ſeveral
-1.79
Monfieur
-1.79
―――――
-1.78
purpoſe
-1.76
itſelf
-1.75
houſe
-1.75
Majefty
-1.74
POSITIVE LOGITS
0.84
.
0.80
K
0.77
en
0.76
C
0.75
R
0.72
<eos>
0.72
tak
0.72
in
0.70
V
0.70
Activations Density 0.110%