INDEX
    Explanations

    The neuron activates on words expressing human dignity, autonomy, and related respect-and-rights concepts.

    New Auto-Interp
    Negative Logits
    .must
    -0.07
     lookout
    -0.06
     related
    -0.06
    -running
    -0.06
    ์ว
    -0.06
     federation
    -0.06
     leftovers
    -0.06
     otros
    -0.06
     secretive
    -0.06
     clich
    -0.06
    POSITIVE LOGITS
     dignity
    0.13
     dign
    0.09
    _UPPER
    0.07
     indign
    0.07
     Agricult
    0.07
     Signature
    0.06
    ↵    ↵↵
    0.06
    ุณ
    0.06
    :!
    0.06
    .di
    0.06
    Act Density 0.003%

    No Known Activations