INDEX
    Explanations

    negative descriptors or adjectives associated with moral judgment

    New Auto-Interp
    Negative Logits
    exitRule
    -0.72
     IFA
    -0.65
     arşivlendi
    -0.63
     EEU
    -0.62
    IFA
    -0.62
     Seitz
    -0.59
    igy
    -0.57
    nasium
    -0.56
    nup
    -0.56
     Wikimedijinoj
    -0.56
    POSITIVE LOGITS
     Wicked
    1.87
    Wicked
    1.70
    wicked
    1.69
     wicked
    1.68
     wickedness
    0.96
    0.48
    0.44
     vicious
    0.43
     Wild
    0.43
     nonlinear
    0.43
    Act Density 0.001%

    No Known Activations