INDEX
    Explanations

    references to social media platforms, particularly Facebook

    New Auto-Interp
    Negative Logits
     practicing
    -0.66
    rero
    -0.65
    1001
    -0.62
     Hof
    -0.62
     brim
    -0.61
    stood
    -0.60
     crank
    -0.60
    VERT
    -0.60
    neg
    -0.59
    ilk
    -0.59
    POSITIVE LOGITS
    Facebook
    0.93
    Twitter
    0.88
    ileaks
    0.86
     Messenger
    0.82
    emouth
    0.81
    twitter
    0.80
    imil
    0.76
     ank
    0.76
    culosis
    0.72
     Features
    0.72
    Act Density 0.010%

    No Known Activations