INDEX
    Explanations

    words meaning never or no

    the assistant's safety-focused disclaimers and strong refusal/ethical-warning statements.

    New Auto-Interp
    Negative Logits
    0.52
    보다는
    0.47
    0.46
     কিছুটা
    0.46
     somewhat
    0.45
     повече
    0.45
     বেশি
    0.45
     יותר
    0.45
     оптими
    0.44
    лишком
    0.43
    POSITIVE LOGITS
     niemals
    0.67
     jamás
    0.62
     ningún
    0.62
     assolutamente
    0.61
     Nunca
    0.61
     never
    0.61
     NEVER
    0.60
     hiçbir
    0.59
    Nunca
    0.59
     ninguna
    0.58
    Act Density 0.449%

    No Known Activations