INDEX
Explanations
structural elements, particularly parentheses and brackets, indicating function calls or conditionals in code
New Auto-Interp
Negative Logits
(n
-0.17
((
-0.15
(s
-0.15
(«
-0.15
lü
-0.15
wort
-0.14
(arg
-0.14
erson
-0.14
urally
-0.14
ï¼Ī
-0.14
POSITIVE LOGITS
odont
0.18
usch
0.16
)
0.16
''
0.15
);
0.15
),
0.15
)↵
0.15
);↵↵
0.15
);↵
0.14
]
0.14
Activations Density 0.171%