INDEX
    Explanations

    This neuron lights up on aggressive or insulting language that labels people or groups as enemies or worthy of vilification.

    New Auto-Interp
    Negative Logits
    -0.07
     císa
    -0.06
    بوب
    -0.06
    187
    -0.06
     graphical
    -0.06
    одатель
    -0.06
     fingert
    -0.06
    िच
    -0.06
    .Logf
    -0.06
     defender
    -0.06
    POSITIVE LOGITS
    (exc
    0.06
    ]|[
    0.06
     looph
    0.06
    _sys
    0.06
     canlı
    0.06
     глу
    0.06
    Column
    0.06
    Basically
    0.06
    ::__
    0.06
     akin
    0.05
    Act Density 0.254%

    No Known Activations