INDEX
Explanations
terms related to racism
references to racism and racially charged comments
New Auto-Interp
Negative Logits
fulness
-0.75
agility
-0.69
ren
-0.67
riage
-0.66
Mech
-0.66
stead
-0.66
endor
-0.65
Imaging
-0.65
Olympus
-0.65
availability
-0.65
POSITIVE LOGITS
racist
3.44
racist
2.65
racists
2.61
racially
2.29
homophobic
2.21
sexist
2.15
racism
2.06
discriminatory
1.98
racial
1.90
acist
1.81
Activations Density 0.040%