INDEX
    Explanations

    words related to racism

    references to racism and related accusations

    New Auto-Interp
    Negative Logits
    tis
    -0.96
    trak
    -0.88
    amina
    -0.86
    oning
    -0.84
    ITNESS
    -0.79
    rolog
    -0.78
    RH
    -0.78
    icular
    -0.77
    aple
    -0.76
    aver
    -0.75
    POSITIVE LOGITS
     racist
    1.18
     racists
    1.14
     slurs
    1.03
     racism
    0.97
     homophobic
    0.96
     nationalist
    0.95
     sexist
    0.93
     stereotypes
    0.92
     caric
    0.91
     supremacist
    0.90
    Act Density 0.014%

    No Known Activations