INDEX
    Explanations

    specific formatting or symbols in the text

    New Auto-Interp
    Negative Logits
     „
    -1.40
    -1.06
     („
    -0.91
    ′,
    -0.90
     «
    -0.88
    ————————————————
    -0.84
     “
    -0.82
     ’
    -0.80
     ‘
    -0.79
    «
    -0.79
    POSITIVE LOGITS
    .--
    1.60
    --"
    1.58
    "--
    1.55
    '--
    1.51
    ,--
    1.50
    --
    1.48
     --
    1.47
    !--
    1.40
    --$
    1.36
    //--
    1.33
    Act Density 0.264%

    No Known Activations