INDEX
Explanations
references to age, race, or class distinctions with a focus on individuals of color
New Auto-Interp
Negative Logits
amount
-0.15
osti
-0.15
eness
-0.15
ifo
-0.15
irim
-0.15
rates
-0.15
abis
-0.14
aminer
-0.14
.presentation
-0.14
aty
-0.14
POSITIVE LOGITS
Means
0.19
stature
0.18
consequence
0.18
integrity
0.18
means
0.17
mixed
0.17
Means
0.16
goodwill
0.16
substance
0.15
ë¯
0.15
Activations Density 0.050%