INDEX
    Explanations

    phrases related to legal actions and consequences

    references to legal and ethical issues

    New Auto-Interp
    Negative Logits
    uart
    -0.71
    leans
    -0.67
    ridor
    -0.61
    ortment
    -0.60
    visual
    -0.59
    cean
    -0.59
    knit
    -0.58
     prepar
    -0.58
    ript
    -0.58
     resid
    -0.57
    POSITIVE LOGITS
     unjust
    0.83
     bullies
    0.82
     injustice
    0.74
     tresp
    0.73
     cowardly
    0.73
     slander
    0.71
     abusive
    0.70
     unfair
    0.69
     harassment
    0.69
     merciless
    0.68
    Act Density 0.790%

    No Known Activations