INDEX
    Explanations

    instances of backslashes or escape characters

    New Auto-Interp
    Negative Logits
     betweenstory
    -0.96
     queſta
    -0.96
    aarrggbb
    -0.91
    iſten
    -0.89
    httphttps
    -0.88
     Geiſt
    -0.85
    Personendaten
    -0.85
     يتيمه
    -0.85
    <unused8>
    -0.85
    <unused14>
    -0.85
    POSITIVE LOGITS
    \
    0.88
     \
    0.73
    1
    0.60
    The
    0.60
    $\
    0.59
    <b>
    0.59
     $\
    0.57
    I
    0.56
    0.54
    <h2>
    0.54
    Act Density 0.006%

    No Known Activations