INDEX
    Explanations

    words related to ethics, justice, and deserving actions or outcomes

    New Auto-Interp
    Negative Logits
    ullivan
    -0.68
    ula
    -0.62
    shr
    -0.60
    edd
    -0.59
     plateau
    -0.58
    cross
    -0.57
     WI
    -0.57
     Sidd
    -0.57
     cycl
    -0.56
     Es
    -0.56
    POSITIVE LOGITS
    arna
    0.90
     applause
    0.85
     precedence
    0.84
     attention
    0.81
    FINE
    0.81
     credit
    0.78
     consideration
    0.77
     praise
    0.75
     dignity
    0.74
     scrutiny
    0.73
    Act Density 0.017%

    No Known Activations