INDEX
    Explanations

    overcoming discrimination and prejudice

    reinforces harmful stereotypes

    New Auto-Interp
    Negative Logits
     अनुर
    0.50
     необы
    0.49
     enthusiast
    0.49
     runny
    0.46
     moelle
    0.44
     உற்ச
    0.43
     veloce
    0.42
     Reliability
    0.42
     সন্ন
    0.42
     பக்தர்கள்
    0.42
    POSITIVE LOGITS
     discriminatory
    1.84
     sexism
    1.80
     discrimination
    1.73
     racism
    1.73
     racist
    1.70
     misog
    1.69
     sexist
    1.63
     Discrimination
    1.55
     Racism
    1.54
    discrimination
    1.53
    Act Density 0.306%

    No Known Activations