INDEX
    Explanations

    instances of punctuation and formatting symbols in a document

    New Auto-Interp
    Negative Logits
    ispiel
    -0.14
    @student
    -0.13
    ãĥ¼ãĥĹ
    -0.13
     Erotische
    -0.13
    uele
    -0.13
     å®®
    -0.13
    ÙĢÙĢÙĢÙĢÙĢÙĢÙĢÙĢ
    -0.12
    ellij
    -0.12
    ::_
    -0.12
     Rog
    -0.12
    POSITIVE LOGITS
    .lng
    0.14
     gangbang
    0.14
     abras
    0.13
     [â̦]
    0.13
     swinger
    0.13
    .",↵
    0.13
     ðŁĺī↵↵
    0.13
    .',↵
    0.12
     Bud
    0.12
     *,↵
    0.12
    Act Density 0.017%

    No Known Activations