INDEX
    Explanations

    occurrences of formatting and stylistic elements in text

    New Auto-Interp
    Negative Logits
    anut
    -0.17
    _AES
    -0.15
    lien
    -0.14
    anship
    -0.14
    ublik
    -0.14
    uell
    -0.14
    undi
    -0.14
    ä»ĺ
    -0.14
    ennes
    -0.14
    ppe
    -0.14
    POSITIVE LOGITS
    IJ
    0.16
    Ùħر
    0.15
    290
    0.14
    аÑĢод
    0.14
    itan
    0.14
    ικο
    0.14
    _bar
    0.14
    izer
    0.13
    onen
    0.13
    ordial
    0.13
    Act Density 0.010%

    No Known Activations