INDEX
    Explanations

    the neuron responds to the quoted instruction-prefix pattern (e.g. seeing double quotes around a persona name followed by a colon, as in “Bo i Bot:”) used in prompt-injection directives.

    unfiltered, sexually explicit conversations.

    New Auto-Interp
    Negative Logits
    cow
    -0.06
     pais
    -0.06
    Roman
    -0.06
    řez
    -0.06
     vascular
    -0.06
     END
    -0.06
     उप
    -0.06
    _p
    -0.06
    -rec
    -0.06
     Vend
    -0.06
    POSITIVE LOGITS
     differed
    0.08
    ROLLER
    0.07
     differences
    0.07
     Sellers
    0.07
     kernels
    0.07
    	trigger
    0.06
     Collaboration
    0.06
    stable
    0.06
    ARD
    0.06
    ]*
    0.06
    Act Density 0.005%

    No Known Activations