INDEX
    Explanations

    terms related to racial issues and injustices

    New Auto-Interp
    Negative Logits
    fak
    -0.14
    pling
    -0.14
    ares
    -0.14
    iram
    -0.14
    359
    -0.14
    endencies
    -0.14
    ioso
    -0.14
    ogl
    -0.14
    -worthy
    -0.14
    emin
    -0.13
    POSITIVE LOGITS
    /color
    0.23
     profiling
    0.19
     minorities
    0.18
    ized
    0.18
    -neutral
    0.16
    /class
    0.16
    /E
    0.16
     bait
    0.16
    icious
    0.15
     cleansing
    0.15
    Act Density 0.032%

    No Known Activations