INDEX
    Explanations

    The neuron fires on words that denote cruelty, violence, or sadistic infliction of pain.

    New Auto-Interp
    Negative Logits
     Florian
    -0.07
     Transportation
    -0.07
     Matte
    -0.07
     Doug
    -0.06
     still
    -0.06
     phi
    -0.06
    ]))
    -0.06
    "fmt
    -0.06
     descendant
    -0.06
     incontr
    -0.06
    POSITIVE LOGITS
    0.08
     видов
    0.08
     инт
    0.07
     Sab
    0.07
     AB
    0.07
     mim
    0.07
    Guid
    0.06
    151
    0.06
    MEM
    0.06
    mak
    0.06
    Act Density 0.008%

    No Known Activations