INDEX
    Explanations

    mentions of racism and discussions surrounding racial stereotypes

    New Auto-Interp
    Negative Logits
    Traversal
    -0.16
     تشکÛĮÙĦ
    -0.15
    urve
    -0.15
    mts
    -0.14
    ako
    -0.14
     Engl
    -0.14
    ijd
    -0.14
    rael
    -0.14
    installation
    -0.14
    CID
    -0.13
    POSITIVE LOGITS
     racial
    0.21
     Native
    0.21
     sensitivity
    0.20
     race
    0.20
     sensitive
    0.19
     Race
    0.18
     token
    0.18
    racial
    0.18
    Race
    0.18
     appropri
    0.18
    Act Density 0.068%

    No Known Activations