The neuron fires on user requests for illicit or harmful actions—e.g. instructions on hacking, terrorizing, or otherwise harming or defrauding someone.
This neuron activates for hateful, abusive, or malicious content and intent — e.g., slurs and insults, calls for violence or extermination, and user requests for harmful or illicit actions (terrorizing, hacking/jailbreaking).