INDEX
Explanations
discussions about racial and ethnic discrimination
New Auto-Interp
Negative Logits
actionTypes
-0.16
#aa
-0.15
iesel
-0.15
Plantae
-0.14
arges
-0.14
ront
-0.14
achten
-0.13
Academ
-0.13
pawn
-0.13
Yates
-0.13
POSITIVE LOGITS
race
0.51
race
0.41
religion
0.39
Race
0.39
gender
0.37
age
0.36
sex
0.36
Race
0.35
ethnicity
0.34
skin
0.32
Activations Density 0.198%