INDEX
    Explanations

    punctuation and structural elements in the text

    New Auto-Interp
    Negative Logits
    aml
    -0.17
    stry
    -0.15
    osph
    -0.15
    igham
    -0.15
    odu
    -0.14
    ós
    -0.14
    rowned
    -0.14
    itten
    -0.14
    atrice
    -0.14
    atri
    -0.14
    POSITIVE LOGITS
    dbe
    0.16
    _WAKE
    0.15
    ales
    0.15
     Wake
    0.15
     Tal
    0.15
    zia
    0.14
     wake
    0.14
    ÙĦب
    0.14
    pone
    0.14
    ÙĬع
    0.14
    Act Density 0.100%

    No Known Activations