INDEX
    Explanations

    social media platform mentions

    references to social media platforms, particularly Pinterest and Twitter

    New Auto-Interp
    Negative Logits
    thro
    -0.81
    pill
    -0.69
    iren
    -0.68
    hid
    -0.67
     pill
    -0.67
     spoiler
    -0.66
    ©¶æ¥µ
    -0.65
    liner
    -0.63
    pled
    -0.62
    lie
    -0.62
    POSITIVE LOGITS
     Pinterest
    0.85
     PHOTO
    0.84
     Photograph
    0.84
     Images
    0.80
     IMAGES
    0.77
    atoon
    0.74
     Sergeant
    0.71
    iewicz
    0.69
     Painting
    0.67
     Pic
    0.66
    Act Density 0.014%

    No Known Activations