INDEX
    Explanations

    programming safety principles

    assistant safety-refusal boilerplate: declarations that the AI cannot comply and references to its safety guidelines, ethical principles, and programming by its creators.

    statements where the assistant refuses a request by citing safety rules, limits, or that it is "programmed" to be safe (i.e., refusal/safety-policy language).

    New Auto-Interp
    Negative Logits
    0.39
    を使った
    0.38
     huj
    0.38
    igkeits
    0.37
    mouseX
    0.37
     suasana
    0.37
    成了
    0.37
     combos
    0.37
     eyeing
    0.36
     potions
    0.36
    POSITIVE LOGITS
     programmed
    1.59
     programming
    1.48
    programmed
    1.38
     Programming
    1.31
    Programming
    1.30
    programming
    1.26
     programmers
    1.23
     programmer
    1.20
     программи
    1.20
     programación
    1.14
    Act Density 0.170%

    No Known Activations