INDEX
    Explanations

    threatening language and confrontational interactions

    New Auto-Interp
    Negative Logits
     broadly
    -0.86
     markedly
    -0.85
     strikingly
    -0.83
    xtap
    -0.80
     outset
    -0.77
     Historically
    -0.76
     reliance
    -0.75
     bolstered
    -0.74
     policymakers
    -0.74
     principally
    -0.72
    POSITIVE LOGITS
     fuckin
    1.65
     fucking
    1.52
     shit
    1.50
     gonna
    1.40
     bitch
    1.39
     crap
    1.35
     fucked
    1.35
     fuck
    1.32
     shitty
    1.31
     haha
    1.30
    Act Density 11.023%

    No Known Activations