INDEX
    Explanations

    punctuation marks and their variations in style

    New Auto-Interp
    Negative Logits
    .
    -0.42
     Rud
    -0.40
     it
    -0.39
     H
    -0.38
     P
    -0.38
     pape
    -0.37
     Oil
    -0.36
     It
    -0.35
     Is
    -0.35
     $\
    -0.35
    POSITIVE LOGITS
    AndEndTag
    0.72
    Rüyada
    0.70
    aarrggbb
    0.69
    [@BOS@]
    0.67
    <unused17>
    0.66
    <unused14>
    0.66
    <unused23>
    0.66
    <unused28>
    0.66
    <unused3>
    0.66
    <unused8>
    0.66
    Act Density 0.026%

    No Known Activations