INDEX
    Explanations

    The neuron activates on words and phrases related to diversity, equity, empowerment, inclusion, and other social-values principles.

    New Auto-Interp
    Negative Logits
    _coverage
    -0.07
    Hair
    -0.07
     слід
    -0.06
    -0.06
    _bundle
    -0.06
     topo
    -0.06
    _users
    -0.06
    _Level
    -0.06
     setName
    -0.06
     Swan
    -0.06
    POSITIVE LOGITS
    ائی
    0.07
    _PAD
    0.07
    ASN
    0.07
    그러
    0.07
    Ğ
    0.06
    reserved
    0.06
     particularly
    0.06
    .VK
    0.06
    тен
    0.06
    -Col
    0.06
    Act Density 0.057%

    No Known Activations