INDEX
Explanations
references to white nationalism and supremacist ideology
New Auto-Interp
Head Attr Weights
0:0.03
1:0.03
2:0.07
3:0.06
4:0.05
5:0.06
6:0.06
7:0.06
8:0.04
9:0.06
10:0.16
11:0.28
Negative Logits
hijab
-2.47
marg
-2.27
accent
-2.27
backgrounds
-2.27
rgb
-2.22
borders
-2.16
uniform
-2.15
bias
-2.13
uniforms
-2.12
dress
-2.11
POSITIVE LOGITS
abolic
2.20
ramer
2.05
pload
1.97
QB
1.91
venture
1.91
udder
1.91
oaded
1.89
verend
1.89
hof
1.85
omsday
1.84
Activations Density 0.001%