INDEX
Explanations
calculation and assignment
the presence of numeric tokens and arithmetic/math expressions (numbers and computation-related symbols) in the text.
New Auto-Interp
Negative Logits
TODO
0.50
-{\0.48
OEt
0.46
+\|\
0.44
nontrivial
0.43
0.42
IPython
0.42
≳
0.42
0.42
synt
0.41
POSITIVE LOGITS
:)
0.55
.......
0.52
"
0.50
.........
0.50
...........
0.50
-->
0.48
............
0.47
Subtract
0.47
........
0.47
0.46
Activations Density 0.566%