INDEX
    Explanations

    words related to negative impact or harm

    expressions related to causing harm or damage

    New Auto-Interp
    Negative Logits
    mad
    -0.72
    uls
    -0.71
    dding
    -0.67
    uesday
    -0.66
    ãĥ¼ãĥ³
    -0.66
    igers
    -0.66
    ricks
    -0.65
    ellen
    -0.65
    odor
    -0.63
    leans
    -0.63
    POSITIVE LOGITS
     havoc
    1.11
     credibility
    1.02
     delicate
    0.96
     morale
    0.94
     livelihood
    0.90
     sensibilities
    0.90
     friendships
    0.88
     integrity
    0.88
     innocent
    0.87
     morals
    0.85
    Act Density 0.229%

    No Known Activations