INDEX
    Explanations

    references to extremist groups and hate-related content, especially related to the Ku Klux Klan

    references to hate groups and associated terms

    New Auto-Interp
    Negative Logits
    Downloadha
    -0.81
    ded
    -0.77
    dra
    -0.77
    kj
    -0.76
    */(
    -0.75
    neau
    -0.75
    gob
    -0.74
    ochond
    -0.74
    til
    -0.73
    abilities
    -0.72
    POSITIVE LOGITS
     Klux
    1.23
     Klan
    1.12
     KKK
    0.95
     Sabha
    0.81
     affiliation
    0.77
     affili
    0.74
     NAACP
    0.73
     robes
    0.73
     Beir
    0.72
     Jr
    0.71
    Act Density 0.011%

    No Known Activations