INDEX
    Explanations

    words related to negative behaviors or actions, especially harassment

    instances of the term "harassment" and related contexts

    New Auto-Interp
    Negative Logits
    éĹĺ
    -0.91
    stanbul
    -0.79
    rient
    -0.77
    zyme
    -0.77
    ACTED
    -0.74
    ECD
    -0.73
    iets
    -0.71
    inet
    -0.69
    arch
    -0.68
    kos
    -0.68
    POSITIVE LOGITS
     harassment
    1.00
     harass
    0.96
     harassing
    0.93
     accus
    0.89
     stalking
    0.82
     tactics
    0.81
     harassed
    0.80
    assment
    0.78
     leveled
    0.72
     inflic
    0.72
    Act Density 0.030%

    No Known Activations