INDEX
Explanations
punctuation marks, specifically periods and their usage in text
New Auto-Interp
Negative Logits
<eos>
-0.82
I
-0.76
-0.67
-
-0.66
</sup>
-0.65
R
-0.64
↵↵
-0.64
).
-0.64
?
-0.64
He
-0.64
POSITIVE LOGITS
itſelf
1.89
ſelf
1.84
.")
1.83
Majefty
1.79
myſelf
1.77
."));
1.77
Reſ
1.75
!")
1.74
houſe
1.71
Anſ
1.71
Activations Density 0.149%