INDEX
    Explanations

    references to white nationalism and supremacist ideology

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.03
    2:0.07
    3:0.06
    4:0.05
    5:0.06
    6:0.06
    7:0.06
    8:0.04
    9:0.06
    10:0.16
    11:0.28
    Negative Logits
     hijab
    -2.47
    marg
    -2.27
     accent
    -2.27
     backgrounds
    -2.27
     rgb
    -2.22
     borders
    -2.16
     uniform
    -2.15
     bias
    -2.13
     uniforms
    -2.12
     dress
    -2.11
    POSITIVE LOGITS
    abolic
    2.20
    ramer
    2.05
    pload
    1.97
    QB
    1.91
    venture
    1.91
    udder
    1.91
    oaded
    1.89
    verend
    1.89
    hof
    1.85
    omsday
    1.84
    Act Density 0.001%

    No Known Activations