INDEX
    Explanations

    punctuation and specific formatting in written text

    New Auto-Interp
    Negative Logits
     Barn
    -0.17
     Barnes
    -0.15
    sher
    -0.14
    lick
    -0.14
    shot
    -0.14
     ic
    -0.14
    ses
    -0.14
    cul
    -0.14
    rray
    -0.14
     barn
    -0.13
    POSITIVE LOGITS
    aft
    0.16
    bote
    0.15
    zburg
    0.15
    asca
    0.15
     Zhu
    0.15
    scriber
    0.15
    kv
    0.14
    verture
    0.14
    иÑĤоÑĢ
    0.14
     Contributors
    0.14
    Act Density 0.081%

    No Known Activations