INDEX
    Explanations

    phrases related to societal oppression and marginalization

    terms related to marginalized and disenfranchised populations

    New Auto-Interp
    Negative Logits
    sis
    -0.87
    odium
    -0.80
    orah
    -0.80
    etric
    -0.76
    othy
    -0.75
    aline
    -0.72
    tein
    -0.70
    etrical
    -0.68
    ibal
    -0.66
    ope
    -0.66
    POSITIVE LOGITS
     marginalized
    0.80
     minorities
    0.75
     Voices
    0.75
    Marginal
    0.74
     bystanders
    0.70
     populations
    0.70
     communities
    0.69
    marg
    0.69
    ĨĴ
    0.68
     disadvantaged
    0.66
    Act Density 0.046%

    No Known Activations