INDEX
    Explanations

    references to racism and racial prejudices

    New Auto-Interp
    Negative Logits
    Personendaten
    -0.67
     يتيمه
    -0.64
    DockStyle
    -0.63
     faptul
    -0.62
    TagMode
    -0.60
    Clik
    -0.59
    DSS
    -0.59
    OutputType
    -0.57
    parse
    -0.56
     perine
    -0.55
    POSITIVE LOGITS
     racist
    0.99
     Racism
    0.94
     racism
    0.91
    racist
    0.88
    Racism
    0.80
     discriminatory
    0.74
    discrimin
    0.74
     Discrimin
    0.72
     prejudices
    0.71
     Prejudice
    0.70
    Act Density 0.024%

    No Known Activations