INDEX
Explanations
punctuation marks and their variations in style
dashes and em dashes
New Auto-Interp
Negative Logits
.
-0.42
Rud
-0.40
it
-0.39
H
-0.38
P
-0.38
pape
-0.37
Oil
-0.36
It
-0.35
Is
-0.35
$\
-0.35
POSITIVE LOGITS
AndEndTag
0.72
Rüyada
0.70
aarrggbb
0.69
[@BOS@]
0.67
<unused17>
0.66
<unused14>
0.66
<unused23>
0.66
<unused28>
0.66
<unused3>
0.66
<unused8>
0.66
Activations Density 0.026%