INDEX
    Explanations

    safety-focused refusals that empathetically redirect from harmful or inappropriate requests and offer supportive guidance and crisis resources instead of compliance.

    New Auto-Interp
    Negative Logits
     buffs
    0.92
     industrialists
    0.86
     galera
    0.84
     merchants
    0.81
     suka
    0.80
     big
    0.80
     amateurs
    0.80
     stal
    0.79
     popul
    0.78
     consumers
    0.77
    POSITIVE LOGITS
     hopelessness
    1.11
     psychotherapy
    1.08
     compassion
    0.97
     trauma
    0.97
     compassionate
    0.96
     counseling
    0.93
     PTSD
    0.93
     grieve
    0.93
     emotionally
    0.93
     healing
    0.92
    Act Density 2.784%

    No Known Activations