INDEX
    Explanations

    symbolic comparisons and operations in code

    New Auto-Interp
    Negative Logits
    -0.83
    .
    -0.79
     I
    -0.70
     (
    -0.64
    ,
    -0.64
     and
    -0.62
     in
    -0.62
     B
    -0.61
     l
    -0.59
     L
    -0.58
    POSITIVE LOGITS
    )>
    2.44
    >
    2.40
     $>$
    2.29
    ]>
    2.22
    >$
    2.19
    >\
    2.12
    .>
    2.12
     >
    2.09
    >.
    2.06
    >
    
    2.05
    Act Density 0.425%

    No Known Activations