INDEX
    Explanations

    terms related to different demographic characteristics or categories such as race, ethnicity, nationality, religion, and disabilities

    terms related to social identity and discrimination categories

    New Auto-Interp
    Negative Logits
    writers
    -0.79
     Raphael
    -0.72
     Peng
    -0.67
    sers
    -0.66
     Canaver
    -0.65
     Kers
    -0.64
     Byr
    -0.64
    said
    -0.64
     HEL
    -0.63
     Dean
    -0.62
    POSITIVE LOGITS
     ethnicity
    1.69
     nationality
    1.59
     gender
    1.54
    Gender
    1.39
     ethnic
    1.37
     Gender
    1.33
     creed
    1.28
     sexuality
    1.27
    gender
    1.26
     Ethnic
    1.25
    Act Density 0.171%

    No Known Activations