INDEX
    Explanations

    parentheses and other opening symbols

    New Auto-Interp
    Negative Logits
     iſt
    -0.82
     Beſ
    -0.78
    ")))
    -0.75
    '));
    
    -0.74
    }]);
    -0.74
     ―――――
    -0.72
    %");
    -0.68
     $_"
    -0.68
     Theſe
    -0.67
     Diſ
    -0.67
    POSITIVE LOGITS
     (
    1.69
    (\
    1.52
    ">(</
    1.51
    >(</
    1.49
    (
    1.48
    1.48
     }^{(
    1.42
    -(
    1.40
    {(
    1.36
    __(
    1.35
    Act Density 1.387%

    No Known Activations