INDEX
Explanations
words indicating relative measurement
comparisons
New Auto-Interp
Negative Logits
Monfieur
-1.88
myſelf
-1.85
Efq
-1.84
Majefty
-1.66
ſeveral
-1.63
raiſ
-1.57
Diſ
-1.56
Jefus
-1.56
Reſ
-1.55
themſelves
-1.55
POSITIVE LOGITS
0.92
,
0.91
(
0.91
-
0.84
0.77
[
0.77
y
0.74
f
0.71
ing
0.69
.
0.69
Activations Density 0.979%