INDEX
    Explanations

    references to white supremacist ideologies and groups

    New Auto-Interp
    Negative Logits
    ronic
    -0.17
    utr
    -0.17
    rug
    -0.15
    nette
    -0.14
    iro
    -0.14
    opp
    -0.14
    argest
    -0.14
    opping
    -0.14
    inue
    -0.14
    inde
    -0.14
    POSITIVE LOGITS
     groups
    0.28
     Groups
    0.23
    -groups
    0.20
    groups
    0.19
    Groups
    0.19
    (groups
    0.18
    _groups
    0.18
     organizations
    0.17
     group
    0.16
     Odin
    0.16
    Act Density 0.038%

    No Known Activations