INDEX
Explanations
punctuation marks and their usage in text
New Auto-Interp
Negative Logits
↵
-0.47
↵↵
-0.27
↵ ↵
-0.24
↵ ↵
-0.22
 
-0.21
↵ ↵
-0.21
,
-0.20
↵ ↵
-0.20
↵ ↵
-0.19
↵ ↵
-0.18
POSITIVE LOGITS
nodoc
0.29
-↵↵
0.23
-(
0.22
:
0.21
_
0.20
<
0.20
s
0.19
[
0.19
nth
0.19
rolley
0.19
Activations Density 0.037%