INDEX
    Explanations

    terms related to negative events or actions

    New Auto-Interp
    Negative Logits
    ellen
    -0.64
    eatures
    -0.63
    Different
    -0.61
    ptin
    -0.60
    cription
    -0.60
     cylinders
    -0.60
     adjusting
    -0.59
    meric
    -0.59
    ynthesis
    -0.57
    ready
    -0.57
    POSITIVE LOGITS
     embarrassment
    1.15
     endanger
    1.08
     harm
    1.04
    angering
    1.04
     jeopard
    1.03
     inconvenience
    0.97
     fate
    0.96
     havoc
    0.95
     tragedies
    0.95
     consequences
    0.93
    Act Density 0.538%

    No Known Activations