INDEX
Explanations
programming safety principles
assistant safety-refusal boilerplate: declarations that the AI cannot comply and references to its safety guidelines, ethical principles, and programming by its creators.
statements where the assistant refuses a request by citing safety rules, limits, or that it is "programmed" to be safe (i.e., refusal/safety-policy language).
New Auto-Interp
Negative Logits
⌛
0.39
を使った
0.38
huj
0.38
igkeits
0.37
mouseX
0.37
suasana
0.37
成了
0.37
combos
0.37
eyeing
0.36
potions
0.36
POSITIVE LOGITS
programmed
1.59
programming
1.48
programmed
1.38
Programming
1.31
Programming
1.30
programming
1.26
programmers
1.23
programmer
1.20
программи
1.20
programación
1.14
Activations Density 0.170%