INDEX
    Explanations

    references to hate groups and their leaders

    New Auto-Interp
    Negative Logits
     Fuse
    -0.17
    fection
    -0.16
    .Sdk
    -0.15
    765
    -0.15
     Mul
    -0.14
    421
    -0.14
    Fuse
    -0.14
     defe
    -0.14
    oppel
    -0.14
    ä»ĭ
    -0.14
    POSITIVE LOGITS
     racist
    0.25
     rac
    0.23
     Ku
    0.22
     Charlottesville
    0.22
    KK
    0.21
    -Nazi
    0.21
     Klan
    0.21
     Hitler
    0.21
     supremacist
    0.21
    rac
    0.20
    Act Density 0.201%

    No Known Activations