INDEX
    Explanations

    negative actions or behaviors related to attacks on reputation, characterized by words like "smear," "slander," "defamation," and "distortions."

    terms associated with attacks on reputation and character

    New Auto-Interp
    Negative Logits
    jo
    -0.78
    iration
    -0.74
     Wond
    -0.74
     Ele
    -0.74
     autom
    -0.73
    hook
    -0.72
     Happiness
    -0.72
    HT
    -0.72
     aw
    -0.69
     Zen
    -0.68
    POSITIVE LOGITS
     smear
    3.21
     slander
    1.77
     libel
    1.73
     defamation
    1.72
     disinformation
    1.62
     misinformation
    1.61
     distort
    1.60
     wedge
    1.55
     distortion
    1.53
     distortions
    1.51
    Act Density 0.051%

    No Known Activations