INDEX
    Explanations

    phrases related to online behavior, social activism, and specific names or terms

    New Auto-Interp
    Negative Logits
    naire
    -0.79
    eous
    -0.78
     bunk
    -0.76
    BOX
    -0.69
     Opera
    -0.68
     Murd
    -0.68
    culosis
    -0.68
     Kaepernick
    -0.68
    IFIED
    -0.67
     Robbins
    -0.67
    POSITIVE LOGITS
    aviour
    1.61
    avior
    1.15
    reath
    1.02
    beh
    0.96
    abus
    0.92
    cipl
    0.91
    assing
    0.91
    anging
    0.90
    avin
    0.89
    olic
    0.89
    Act Density 7.222%

    No Known Activations