INDEX
    Explanations

    phrases related to race, specifically focusing on the term "white"

    references to race and ethnic identities, particularly those related to white individuals and systemic issues

    New Auto-Interp
    Negative Logits
    BIL
    -0.79
    bably
    -0.78
    utenberg
    -0.72
    ICLE
    -0.69
    pmwiki
    -0.66
    INAL
    -0.66
    incial
    -0.65
    inarily
    -0.65
    FORMATION
    -0.64
    ENTION
    -0.64
    POSITIVE LOGITS
    oak
    0.72
    peria
    0.69
    igans
    0.67
    papers
    0.67
    stadt
    0.66
    sup
    0.66
    oxide
    0.66
    cloth
    0.65
    sand
    0.63
    ander
    0.62
    Act Density 0.088%

    No Known Activations