INDEX
    Explanations

    instances of structured formats or organized lists in written content

    New Auto-Interp
    Negative Logits
     latter
    -0.17
    athi
    -0.15
    raz
    -0.14
     Wrath
    -0.14
    zin
    -0.14
    _MT
    -0.14
    .blog
    -0.14
    udu
    -0.14
    mts
    -0.13
    ưá»Ŀi
    -0.13
    POSITIVE LOGITS
     Redistributions
    0.24
    =-=-=-=-=-=-=-=-
    0.18
    ³³ 
    0.16
    AMI
    0.15
    ï¸
    0.15
    ³³³³³
    0.15
    ̣
    0.14
    737
    0.14
     بÙĪØ§Ø¨Ø©
    0.13
    ilst
    0.13
    Act Density 0.151%

    No Known Activations