INDEX
    Explanations

    references to racism and race-related topics

    New Auto-Interp
    Negative Logits
    es
    -0.21
    ez
    -0.16
    andez
    -0.15
    ecurity
    -0.15
     accord
    -0.15
    urt
    -0.14
    esen
    -0.14
    ncia
    -0.14
    íĿ¥
    -0.14
    ancing
    -0.14
    POSITIVE LOGITS
    coon
    0.31
    quet
    0.28
     rac
    0.26
     Rac
    0.26
    oon
    0.25
    oons
    0.21
    lette
    0.21
    rac
    0.20
    quete
    0.20
    etr
    0.20
    Act Density 0.007%

    No Known Activations