INDEX
    Explanations

    phrases related to moral judgments and opinions

    expressions related to the concept of acceptability or unacceptability

    New Auto-Interp
    Negative Logits
    craft
    -0.82
    enfranch
    -0.78
    dream
    -0.77
    ilant
    -0.76
    planes
    -0.76
    ynthesis
    -0.74
    ocket
    -0.72
    wright
    -0.72
    frey
    -0.72
    lets
    -0.72
    POSITIVE LOGITS
     deviations
    0.79
     CPC
    0.72
    ible
    0.71
     standards
    0.71
     Danger
    0.71
     srfAttach
    0.70
    itable
    0.70
     compromises
    0.69
    Gi
    0.69
     norms
    0.69
    Act Density 0.037%

    No Known Activations