INDEX
    Explanations

    references to violence and physical confrontations

    New Auto-Interp
    Negative Logits
     للمعارف
    -0.42
    SpringBootTest
    -0.41
    Espèce
    -0.40
     kasarigan
    -0.40
    ########.
    -0.40
    pleaños
    -0.39
    dstuk
    -0.39
    expandindo
    -0.38
    vician
    -0.38
    niczy
    -0.37
    POSITIVE LOGITS
     brawl
    0.61
     fight
    0.58
    fight
    0.57
     unarmed
    0.57
     fist
    0.55
    fights
    0.54
     fights
    0.53
     altercation
    0.53
     fists
    0.52
     Fights
    0.52
    Act Density 0.447%

    No Known Activations