INDEX
    Explanations

    mentions of bullying and related behaviors

    New Auto-Interp
    Negative Logits
    rahim
    -0.18
    boom
    -0.15
    adio
    -0.15
    окол
    -0.14
    pong
    -0.14
     thất
    -0.14
    ocaly
    -0.13
     Baby
    -0.13
    obi
    -0.13
    zig
    -0.13
    POSITIVE LOGITS
     bull
    0.63
     Bul
    0.59
    bul
    0.56
     Bull
    0.54
     bullying
    0.52
     bully
    0.51
    bull
    0.47
     bul
    0.45
     bullied
    0.43
     cyber
    0.41
    Act Density 0.038%

    No Known Activations