INDEX
Explanations
the word "enabled" and nearby code snippets
New Auto-Interp
Negative Logits
.
-1.16
-
-0.91
↵
-0.90
(
-0.89
R
-0.86
,
-0.85
e
-0.84
_
-0.83
M
-0.81
P
-0.79
POSITIVE LOGITS
myſelf
2.00
itſelf
1.91
ſelf
1.89
Efq
1.87
Majefty
1.84
Monfieur
1.83
ſelves
1.83
Jefus
1.82
pleaſure
1.79
iſt
1.78
Activations Density 0.674%