INDEX
    Explanations

    punctuation and special character patterns, particularly apostrophes and quotation marks

    New Auto-Interp
    Negative Logits
    åłĤ
    -0.16
    984
    -0.15
    449
    -0.15
    996
    -0.14
    ivant
    -0.14
    modes
    -0.14
    lauf
    -0.14
    âĢĮاÙĦ
    -0.14
    sse
    -0.13
    osu
    -0.13
    POSITIVE LOGITS
    ÂĢÂĻ
    0.20
    edar
    0.15
    нак
    0.15
    owed
    0.14
    zbollah
    0.14
    quist
    0.14
    -ts
    0.14
    ogene
    0.14
    wan
    0.14
    âĢį
    0.13
    Act Density 0.067%

    No Known Activations