INDEX
    Explanations

    references to discrimination and marginalization of specific groups

    New Auto-Interp
    Negative Logits
    aget
    -0.17
    ë³µ
    -0.15
    adro
    -0.15
    meden
    -0.15
    venile
    -0.15
    ÙĪÙĦÙĬ
    -0.14
    ÙĩاÛĮ
    -0.14
    ä¿
    -0.14
    ?type
    -0.14
    ADED
    -0.14
    POSITIVE LOGITS
     certain
    0.32
     minorities
    0.25
     Certain
    0.24
     vulnerable
    0.24
     Minor
    0.23
    Certain
    0.23
     people
    0.23
     groups
    0.22
     others
    0.22
    Minor
    0.21
    Act Density 0.144%

    No Known Activations