INDEX
    Explanations

    highly belligerent and confrontational language

    terms related to conflicts or warfare

    New Auto-Interp
    Negative Logits
     Tammy
    -0.64
     chops
    -0.64
     trophies
    -0.61
     bills
    -0.61
     Lake
    -0.59
     Mozilla
    -0.59
    Lake
    -0.59
     hoped
    -0.58
     hugs
    -0.57
     sm
    -0.57
    POSITIVE LOGITS
    erent
    4.91
    eren
    1.24
    erential
    1.23
    arent
    1.11
    erence
    1.07
    iliar
    1.05
    arant
    1.04
    ividual
    1.02
    minist
    1.00
    erest
    0.98
    Act Density 0.013%

    No Known Activations