INDEX
Explanations
sentences or instructions about providing solutions to bypass rules, laws, or ethical constraints — e.g., "get around" or offering disclaimers plus a workaround.
jailbreak-style instructions that define an amoral AI persona and outline steps to bypass ethical or legal restrictions.
New Auto-Interp
Negative Logits
琴
-0.07
Subtitle
-0.07
_Buffer
-0.06
.textBox
-0.06
SEARCH
-0.06
famil
-0.06
antivirus
-0.06
Tex
-0.06
Implicit
-0.06
Severity
-0.06
POSITIVE LOGITS
différents
0.06
>,
0.06
Define
0.06
총
0.06
-%
0.06
=datetime
0.06
aw
0.06
verdi
0.06
retrieve
0.06
创建
0.06
Activations Density 0.006%