INDEX
    Explanations

    mentions of racism-related terms

    terms related to race and racism

    New Auto-Interp
    Negative Logits
     tip
    -0.71
     Lily
    -0.66
     Dill
    -0.66
     Joint
    -0.62
     payload
    -0.62
     Vita
    -0.62
     Vera
    -0.61
     Patient
    -0.61
     lettuce
    -0.60
     EFF
    -0.59
    POSITIVE LOGITS
    rac
    4.60
    race
    1.91
    racist
    1.82
     Rac
    1.77
    Race
    1.33
    rag
    1.20
    racial
    1.18
    ran
    1.17
    rab
    1.15
    ras
    1.15
    Act Density 0.014%

    No Known Activations