INDEX
Explanations
references to significant actions and historical contributions related to social justice and discrimination
New Auto-Interp
Negative Logits
featureID
-0.55
poisson
-0.52
GeoNames
-0.51
hacker
-0.50
paš
-0.49
Fleury
-0.47
ByVersion
-0.47
Kiri
-0.46
hackers
-0.45
punct
-0.45
POSITIVE LOGITS
racial
1.49
racism
1.45
racist
1.40
racially
1.36
Racial
1.31
Racism
1.27
racial
1.23
segregation
1.16
race
1.15
Racism
1.14
Activations Density 0.549%