INDEX
    Explanations

    links to images, specifically Twitter image links

    punctuation marks, specifically periods

    New Auto-Interp
    Negative Logits
     audits
    -0.69
     involuntary
    -0.66
     volunt
    -0.64
     confidentiality
    -0.62
     indemn
    -0.62
     tsun
    -0.61
     conformity
    -0.60
     equilibrium
    -0.60
     interchange
    -0.60
     disadvant
    -0.59
    POSITIVE LOGITS
    twitter
    1.66
    facebook
    1.13
    imgur
    1.08
    google
    1.05
    twitch
    1.02
    redd
    0.94
    youtube
    0.93
    reddit
    0.90
    blogspot
    0.90
    github
    0.87
    Act Density 0.026%

    No Known Activations