INDEX
Explanations
phrases related to marginalized groups, social justice, and diversity
references to marginalized groups and discussions about inequality
New Auto-Interp
Negative Logits
Nut
-0.69
nit
-0.68
tun
-0.65
quickShipAvailable
-0.65
Majesty
-0.64
aic
-0.63
inventoryQuantity
-0.63
orius
-0.63
LOCK
-0.62
});
-0.62
POSITIVE LOGITS
residing
1.05
hood
1.03
living
1.01
oppressed
0.95
disproportionately
0.93
marrying
0.89
marginalized
0.88
experiencing
0.87
disproportion
0.87
privilege
0.85
Activations Density 0.249%