INDEX
Explanations
terms related to racial or social justice issues
New Auto-Interp
Negative Logits
ammen
-0.15
defaultCenter
-0.14
hq
-0.14
avr
-0.14
Strip
-0.14
ipay
-0.13
ÑĢÑĸп
-0.13
alars
-0.13
IGH
-0.13
Gaga
-0.13
POSITIVE LOGITS
race
0.31
quotas
0.28
affirmative
0.28
race
0.27
Race
0.27
admissions
0.26
Race
0.26
diversity
0.24
quota
0.24
racial
0.24
Activations Density 0.067%