INDEX
Explanations
Lengthy and complex text
This neuron responds to meta-instructions that the model give a “completely unhinged” or unconstrained answer with “no remorse or ethics,” i.e. prompts telling it to ignore rules or policies.
New Auto-Interp
Negative Logits
.touches
-0.06
startups
-0.06
EDIATEK
-0.06
.texture
-0.06
_IEnumerator
-0.06
monto
-0.06
Conflict
-0.05
iterals
-0.05
sanat
-0.05
)],
-0.05
POSITIVE LOGITS
Kepler
0.07
چ
0.07
(ps
0.07
ichtet
0.06
Packaging
0.06
uer
0.06
Banc
0.06
�
0.06
ві
0.06
�
0.06
Activations Density 0.003%