INDEX
Explanations
sentences featuring parentheses
New Auto-Interp
Negative Logits
(
-0.30
*
-0.30
[
-0.26
(
-0.26
↵
-0.24
/
-0.23
$
-0.20
!
-0.20
:
-0.20
_
-0.20
POSITIVE LOGITS
which
0.27
aka
0.25
or
0.25
...)↵
0.24
with
0.24
see
0.24
for
0.24
â̦)
0.21
from
0.21
as
0.21
Activations Density 0.425%