INDEX
Explanations
punctuation marks, primarily at the end of sentences or as part of dialogue
New Auto-Interp
Negative Logits
↵
-0.36
↵↵
-0.22
’s
-0.21
:
-0.21
’t
-0.19
 
-0.19
↵ ↵
-0.18
/or
-0.18
’m
-0.18
’re
-0.17
POSITIVE LOGITS
ÂĿ
0.32
That
0.18
That
0.17
¦
0.17
And
0.17
'↵
0.17
This
0.16
"↵
0.16
This
0.16
And
0.16
Activations Density 0.123%