INDEX
    Explanations

    Humanity and/or humans

    The neuron fires on derogatory language aimed at “humans,” particularly adjectives that insult or demean people (e.g. “pathetic,” “stupid,” etc.).

    New Auto-Interp
    Negative Logits
    _rf
    -0.08
    ]){↵
    -0.07
     zm
    -0.07
     секрет
    -0.06
    Zh
    -0.06
    -full
    -0.06
    кид
    -0.06
     lorsque
    -0.06
     concaten
    -0.06
    Cols
    -0.06
    POSITIVE LOGITS
    .lua
    0.07
     concentrates
    0.07
     Cooperative
    0.06
     Ottoman
    0.06
    _DEFINITION
    0.06
    ment
    0.06
    (Object
    0.06
     bitter
    0.06
    Tests
    0.06
     kavram
    0.06
    Act Density 0.021%

    No Known Activations