INDEX
    Explanations

    links to Twitter posts

    URLs or web links in the text

    New Auto-Interp
    Negative Logits
     boun
    -0.67
     notices
    -0.63
     Fortune
    -0.62
     appeals
    -0.61
     decor
    -0.61
     cler
    -0.61
     memos
    -0.61
     successfully
    -0.60
     contemplation
    -0.59
     franchise
    -0.59
    POSITIVE LOGITS
    dL
    1.14
    Gh
    1.13
    dk
    1.12
    CN
    1.10
    Hu
    1.09
    OX
    1.07
    nv
    1.07
    oa
    1.07
    bf
    1.06
    dp
    1.06
    Act Density 0.016%

    No Known Activations