INDEX
Explanations
proper nouns and corresponding punctuation marks as they appear at the end of sentences
New Auto-Interp
Negative Logits
¥µ
-0.76
acly
-0.72
("-0.71
ahime
-0.68
negie
-0.66
roximately
-0.61
—"
-0.60
"
-0.58
avored
-0.58
rely
-0.58
POSITIVE LOGITS
'.
2.18
,'
2.13
',
2.10
.'
2.09
?'
2.08
','
1.99
';
1.96
'[
1.92
',
1.91
'.
1.90
Activations Density 0.216%