INDEX
Explanations
instances of code and programming syntax
New Auto-Interp
Negative Logits
for
-0.26
first
-0.20
to
-0.20
not
-0.20
with
-0.20
on
-0.20
and
-0.19
in
-0.19
if
-0.18
do
-0.18
POSITIVE LOGITS
0.32
)↵↵
0.24
↵
0.21
v
0.18
.core
0.18
_core
0.18
-Core
0.18
-core
0.17
g
0.17
u
0.17
Activations Density 0.004%