INDEX
    Explanations

    mentions of physical violence, specifically instances of being physically attacked or harmed

    instances of physical violence or assault

    New Auto-Interp
    Negative Logits
    orrow
    -0.89
    isse
    -0.75
    gravity
    -0.74
    oplan
    -0.73
    ortium
    -0.72
     facult
    -0.72
    FF
    -0.70
    alg
    -0.67
    ordan
    -0.67
     rover
    -0.67
    POSITIVE LOGITS
     beaten
    1.09
    beat
    0.99
     beating
    0.86
    ¶æ
    0.82
    boxing
    0.80
    down
    0.79
    soever
    0.77
    Beat
    0.76
     Beat
    0.75
     beat
    0.75
    Act Density 0.018%

    No Known Activations