INDEX
    Explanations

    actions and emotions

    The main thing this neuron does is detect explicit sexual or pornographic terms.

    New Auto-Interp
    Negative Logits
    agger
    -0.07
    URIComponent
    -0.07
     Executors
    -0.07
    issa
    -0.07
     chatting
    -0.07
    LineColor
    -0.07
     camp
    -0.06
     millet
    -0.06
    /r
    -0.06
     an
    -0.06
    POSITIVE LOGITS
     Epic
    0.06
    0.06
    gment
    0.06
     [_
    0.06
     ดร
    0.06
    оюз
    0.06
     prefix
    0.06
     listOf
    0.06
     flipped
    0.06
     convenience
    0.06
    Act Density 0.005%

    No Known Activations