INDEX
    Explanations

    numerical patterns and specific structured formats in the text

    New Auto-Interp
    Negative Logits
     against
    -0.15
    ung
    -0.14
     explosion
    -0.14
     Ferm
    -0.14
     hom
    -0.14
     Sor
    -0.14
     Teens
    -0.14
    Closing
    -0.14
    util
    -0.14
     tolerant
    -0.13
    POSITIVE LOGITS
    ckt
    0.17
    erot
    0.17
    zza
    0.16
    itom
    0.15
    cco
    0.15
    ocale
    0.14
    ypress
    0.14
    edla
    0.14
    گاÙĨ
    0.14
    ifact
    0.14
    Act Density 0.407%

    No Known Activations