INDEX
    Explanations

    words related to negative social behavior or mistreatment of individuals

    mentions of harassment and related behaviors

    New Auto-Interp
    Negative Logits
    éĹĺ
    -0.99
    icts
    -0.78
    inet
    -0.78
    zyme
    -0.75
    ACTED
    -0.73
    chart
    -0.72
    arch
    -0.71
    lined
    -0.71
    essential
    -0.70
    shows
    -0.70
    POSITIVE LOGITS
     harass
    0.91
     harassment
    0.90
     harassing
    0.87
     harassed
    0.78
    assment
    0.76
     stalking
    0.75
     accus
    0.75
     tactics
    0.73
    ãĥĨ
    0.71
    ingly
    0.67
    Act Density 0.026%

    No Known Activations