INDEX
    Explanations

    expressions of moral judgment regarding actions and societal norms

    New Auto-Interp
    Negative Logits
     AssemblyCulture
    -0.79
    IntoConstraints
    -0.78
    parsedMessage
    -0.75
     متعلقه
    -0.75
    webElementXpaths
    -0.74
    ValueStyle
    -0.73
     utafitiHapana
    -0.72
    ódó
    -0.71
    хьтан
    -0.69
     виправивши
    -0.68
    POSITIVE LOGITS
     people
    0.79
     sometimes
    0.66
     stereotypes
    0.65
     stereotype
    0.64
    people
    0.60
     ignorant
    0.59
    有些人
    0.59
     often
    0.59
     misunderstand
    0.57
     Often
    0.57
    Act Density 0.799%

    No Known Activations