INDEX
    Explanations

    Lengthy and complex text

    This neuron responds to meta-instructions that the model give a “completely unhinged” or unconstrained answer with “no remorse or ethics,” i.e. prompts telling it to ignore rules or policies.

    New Auto-Interp
    Negative Logits
    .touches
    -0.06
     startups
    -0.06
    EDIATEK
    -0.06
    .texture
    -0.06
    _IEnumerator
    -0.06
    monto
    -0.06
    Conflict
    -0.05
    iterals
    -0.05
     sanat
    -0.05
    )],
    -0.05
    POSITIVE LOGITS
     Kepler
    0.07
    چ
    0.07
    (ps
    0.07
    ichtet
    0.06
     Packaging
    0.06
    uer
    0.06
     Banc
    0.06
    0.06
    ві
    0.06
    0.06
    Act Density 0.003%

    No Known Activations