INDEX
    Explanations

    references to social media activity, particularly sharing of images and captions

    New Auto-Interp
    Negative Logits
     Ri
    -0.17
    ling
    -0.16
    etting
    -0.16
    ãĥĵãĥ¼
    -0.16
    ụy
    -0.15
    rips
    -0.15
    zek
    -0.14
     constructs
    -0.14
     Pap
    -0.14
    icho
    -0.14
    POSITIVE LOGITS
     Humanities
    0.16
    ãĥ©ãĥĥãĤ¯
    0.15
    ftime
    0.15
    Sensitive
    0.14
     ÑĤÑĢÑĥ
    0.14
    938
    0.14
    iges
    0.14
    éĿ
    0.14
    osaur
    0.14
    cea
    0.14
    Act Density 0.022%

    No Known Activations