INDEX
    Explanations

    mentions of unethical behavior and discrimination in social contexts

    New Auto-Interp
    Negative Logits
     helicop
    -0.66
     cryst
    -0.61
    artney
    -0.61
     respectively
    -0.59
    oother
    -0.58
    itored
    -0.58
    combe
    -0.58
    querque
    -0.57
     prest
    -0.57
     challeng
    -0.56
    POSITIVE LOGITS
     coward
    0.76
     hypocrisy
    0.72
     modesty
    0.68
     liberals
    0.67
     dare
    0.66
     bigotry
    0.66
     blaming
    0.66
     feminists
    0.65
     offended
    0.65
     dishon
    0.64
    Act Density 0.523%

    No Known Activations