INDEX
    Explanations

    words related to unethical or illegal behavior, specifically misconduct

    instances of the word "misconduct" and related phrases

    New Auto-Interp
    Negative Logits
    LM
    -0.76
    hered
    -0.73
     Sabha
    -0.69
     Lear
    -0.68
    ebus
    -0.68
    zan
    -0.66
    izen
    -0.64
     Collider
    -0.63
    oos
    -0.63
     Juliet
    -0.63
    POSITIVE LOGITS
    owship
    0.99
     discharge
    0.76
    utes
    0.73
    onduct
    0.71
    uracy
    0.69
     misconduct
    0.69
    aunders
    0.68
    orem
    0.66
    eatures
    0.66
    misc
    0.66
    Act Density 0.037%

    No Known Activations