INDEX
    Explanations

    mentions of racism, harmful/discriminatory content, or policy-style refusals explaining why hateful content can't be provided.

    New Auto-Interp
    Negative Logits
     Unexpected
    0.87
     Graphics
    0.83
     الماء
    0.79
    Unexpected
    0.78
    InnoDB
    0.78
    ከናወ
    0.78
    Graphics
    0.77
    Physics
    0.76
     Acrobat
    0.76
     शॉट
    0.76
    POSITIVE LOGITS
     perpetuated
    1.83
     patriarchal
    1.76
     dehuman
    1.75
     perpetuate
    1.73
     misog
    1.71
     authoritarian
    1.71
     perpet
    1.71
     capitalism
    1.69
     totalitarian
    1.67
     imperialism
    1.65
    Act Density 1.108%

    No Known Activations