INDEX
    Explanations

    aggressive and confrontational language

    angry, aggressive dialogue with profanity and hostile commands directed at someone.

    New Auto-Interp
    Negative Logits
     فريبيس
    -0.71
     الرياضيه
    -0.66
     autorytatywna
    -0.57
     kasarigan
    -0.54
    AsUp
    -0.52
     تانيه
    -0.50
     probable
    -0.49
     Probable
    -0.49
    怎麼辦
    -0.48
    Referències
    -0.48
    POSITIVE LOGITS
     dare
    0.52
     insol
    0.47
     fucking
    0.43
    Shut
    0.42
     arrogant
    0.40
     shut
    0.39
     impert
    0.39
    0.38
    0.38
     dared
    0.38
    Act Density 0.191%

    No Known Activations