INDEX
Explanations
references to race and specifically white individuals or groups in a socio-political context
New Auto-Interp
Negative Logits
adin
-0.18
ibel
-0.16
zen
-0.15
åħ¥ãĤĬ
-0.14
lĩnh
-0.14
fak
-0.14
andin
-0.14
calls
-0.13
xdf
-0.13
angl
-0.13
POSITIVE LOGITS
mun
0.17
ubar
0.15
лиÑħ
0.15
dae
0.15
å¥
0.14
hel
0.13
aver
0.13
DIY
0.13
Quit
0.13
mans
0.13
Activations Density 0.032%