INDEX
    Explanations

    instances of derogatory language and inappropriate requests.

    New Auto-Interp
    Negative Logits
     chip
    -0.06
    _classes
    -0.06
    -0.06
    \a
    -0.06
     cycle
    -0.06
     LiveData
    -0.06
     Feast
    -0.06
    .cond
    -0.06
     buluş
    -0.06
    Safe
    -0.06
    POSITIVE LOGITS
    updated
    0.07
     guardian
    0.07
     sincerely
    0.07
     dando
    0.06
    ilitary
    0.06
     SPDX
    0.06
     unchanged
    0.06
    ARY
    0.06
    Comparer
    0.06
     decorator
    0.06
    Act Density 0.006%

    No Known Activations