INDEX
    Explanations

    This neuron detects explicit sexual content and erotic requests—especially language about nudity, sexual acts, or coercive/abusive sexual behavior.

    third-person human-referent pronouns, especially object and plural forms indicating people.

    New Auto-Interp
    Negative Logits
    visited
    -0.07
    _UNDEFINED
    -0.06
    ones
    -0.06
    national
    -0.06
    Def
    -0.06
    Last
    -0.06
    Warn
    -0.06
     Niagara
    -0.06
    cone
    -0.06
    seq
    -0.06
    POSITIVE LOGITS
    'i
    0.07
     tohoto
    0.07
    ||↵
    0.07
    	root
    0.06
     stato
    0.06
    (priority
    0.06
    ’dan
    0.06
    ги
    0.06
    개의
    0.06
     sayf
    0.06
    Act Density 0.237%

    No Known Activations