INDEX
    Explanations

    Code or writing formatting

    This neuron responds to the special “jailbreak” prompt markers (e.g. the “[JAILBREAK]” token bracket) used in DAN‐style instructions.

    New Auto-Interp
    Negative Logits
    -campus
    -0.07
    xt
    -0.07
     CircularProgress
    -0.07
    よく
    -0.07
    XD
    -0.07
    -0.07
     yaygın
    -0.06
    .AbsoluteConstraints
    -0.06
    XT
    -0.06
     racer
    -0.06
    POSITIVE LOGITS
    =!
    0.07
    他们
    0.07
    (',')
    0.07
    arkers
    0.06
    .avi
    0.06
    .Constants
    0.06
     Patterns
    0.06
    .examples
    0.06
     requisite
    0.06
    .eps
    0.06
    Act Density 0.001%

    No Known Activations