INDEX
    Explanations

    descriptions of actions or situations that are considered unacceptable

    terms related to unacceptability and moral judgment

    New Auto-Interp
    Negative Logits
     Insight
    -0.70
    craft
    -0.66
     Fortune
    -0.65
     Mov
    -0.65
    mus
    -0.63
    ier
    -0.63
    stone
    -0.60
     Born
    -0.60
     Heal
    -0.59
     Speed
    -0.59
    POSITIVE LOGITS
     unacceptable
    3.36
     intolerable
    2.29
    acceptable
    1.98
     undesirable
    1.84
     unsustainable
    1.72
     inappropriate
    1.66
     objectionable
    1.65
     appalling
    1.61
     unbearable
    1.61
     acceptable
    1.57
    Act Density 0.021%

    No Known Activations