INDEX
Explanations
punctuation and special characters in the text
New Auto-Interp
Negative Logits
↵
-0.18
>
-0.18
 
-0.18
itself
-0.17
\u
-0.17
\n
-0.16
certain
-0.14
\-
-0.14
\<
-0.14
\x
-0.14
POSITIVE LOGITS
et
0.38
...,
0.26
;
0.24
...
0.23
_et
0.21
...)
0.21
others
0.20
â̦
0.18
van
0.18
,
0.18
Activations Density 0.004%