INDEX
    Explanations

    specific words related to societal norms

    references to societal norms and expectations

    New Auto-Interp
    Negative Logits
    head
    -0.73
    wen
    -0.72
    RET
    -0.66
    iddler
    -0.64
    lees
    -0.64
     Hidden
    -0.64
     Died
    -0.63
     del
    -0.63
    zz
    -0.63
    lee
    -0.63
    POSITIVE LOGITS
     norms
    3.65
     norm
    1.97
     normative
    1.66
     conventions
    1.60
    norm
    1.59
     stereotypes
    1.41
     standards
    1.38
     ideals
    1.38
    Norm
    1.37
     expectations
    1.36
    Act Density 0.017%

    No Known Activations