INDEX
    Explanations

    terms and phrases associated with white supremacy and racist ideologies

    New Auto-Interp
    Negative Logits
    enson
    -0.18
    olie
    -0.16
     stro
    -0.15
    jang
    -0.15
    rupa
    -0.15
     Dash
    -0.15
    pleted
    -0.15
     Verm
    -0.14
     rect
    -0.14
    inent
    -0.14
    POSITIVE LOGITS
    imizer
    0.15
     Sadd
    0.14
    praak
    0.14
    ì
    0.14
    Sortable
    0.14
    vir
    0.14
    ablish
    0.14
    ?option
    0.13
    edom
    0.13
    ître
    0.13
    Act Density 0.048%

    No Known Activations