INDEX
    Explanations

    instances of threats and negative commentary related to gender dynamics

    New Auto-Interp
    Negative Logits
     ê²ģ
    -0.14
    imi
    -0.14
    umni
    -0.14
    .showError
    -0.14
    imos
    -0.13
    ActivityIndicator
    -0.13
    upe
    -0.13
    ALAR
    -0.13
    tape
    -0.13
    _STRIP
    -0.13
    POSITIVE LOGITS
     trolls
    0.36
     trolling
    0.35
     troll
    0.32
     hat
    0.29
     cyber
    0.29
     vit
    0.28
     Troll
    0.27
     mean
    0.27
     online
    0.26
     keyboard
    0.24
    Act Density 0.045%

    No Known Activations