INDEX
    Explanations

    phrases related to moral or ethical judgments

    New Auto-Interp
    Negative Logits
    ilant
    -0.88
    craft
    -0.80
    lets
    -0.77
    avers
    -0.76
    oling
    -0.75
    ocket
    -0.74
    cest
    -0.73
    planes
    -0.72
    yss
    -0.71
    frey
    -0.71
    POSITIVE LOGITS
     deviations
    0.86
     behaviour
    0.80
     Danger
    0.78
     behavior
    0.77
    ible
    0.75
     compromises
    0.74
     standards
    0.74
     norms
    0.73
     srfAttach
    0.72
     acceptable
    0.71
    Act Density 0.047%

    No Known Activations