INDEX
    Explanations

    phrases that discuss race and ethnicity in the context of judgment and equality

    New Auto-Interp
    Negative Logits
    arges
    -0.15
    #aa
    -0.14
     meanings
    -0.14
     actionTypes
    -0.14
    Ð®ÐĽ
    -0.13
     futures
    -0.13
    UX
    -0.13
     harms
    -0.13
    ux
    -0.13
     داÙħ
    -0.13
    POSITIVE LOGITS
     race
    0.49
    race
    0.40
     gender
    0.39
     religion
    0.38
     age
    0.37
     Race
    0.36
     ethnicity
    0.36
     nationality
    0.35
    Race
    0.34
     sex
    0.33
    Act Density 0.250%

    No Known Activations