INDEX
Explanations
dashes or hyphens in the text
New Auto-Interp
Negative Logits
P
-0.64
de
-0.61
sub
-0.58
$\
-0.57
p
-0.56
of
-0.55
$
-0.55
in
-0.54
L
-0.54
B
-0.54
POSITIVE LOGITS
—
2.03
--
1.96
––
1.93
—-
1.91
”—
1.91
——
1.88
,—
1.82
!—
1.81
)—
1.80
,--
1.77
Activations Density 0.190%