INDEX
    Explanations

    word sequences related to harassment, discrimination, abuse, violence and assault

    references to harassment, violence, and related abuses

    New Auto-Interp
    Negative Logits
    liam
    -0.78
     Bucket
    -0.77
    DragonMagazine
    -0.74
     Tycoon
    -0.74
    Legendary
    -0.71
    ernels
    -0.71
     Compact
    -0.71
    */(
    -0.69
    natureconservancy
    -0.68
    phalt
    -0.67
    POSITIVE LOGITS
     intimidation
    1.63
     harassment
    1.52
     retaliation
    1.49
     bullying
    1.43
     discrimination
    1.42
     coercion
    1.36
     harassing
    1.34
     unwelcome
    1.31
     derogatory
    1.31
     threats
    1.27
    Act Density 0.322%

    No Known Activations