INDEX
    Explanations

    written documents such as letters, memos, articles, or blog posts

    references to various types of documents and communications, such as letters and memos

    New Auto-Interp
    Negative Logits
    instead
    -0.67
    cause
    -0.61
     outweigh
    -0.60
    ctrl
    -0.60
    artifacts
    -0.58
     injust
    -0.58
    tics
    -0.56
    illard
    -0.56
     despise
    -0.56
    animate
    -0.56
    POSITIVE LOGITS
     nutshell
    0.84
    idav
    0.74
     announcing
    0.73
     interview
    0.69
     titled
    0.68
     HuffPost
    0.67
     emailed
    0.65
    fter
    0.64
     published
    0.64
     released
    0.64
    Act Density 0.120%

    No Known Activations