INDEX
Explanations
references to discriminatory actions or behavior based on various factors such as religion, gender, or sexual orientation
terms related to discrimination, particularly in social and religious contexts
New Auto-Interp
Negative Logits
money
-0.83
creation
-0.80
Reb
-0.80
examination
-0.73
Money
-0.70
uro
-0.68
uild
-0.68
anol
-0.67
oly
-0.66
forum
-0.66
POSITIVE LOGITS
discriminated
1.20
discriminate
1.15
discrim
1.08
discriminating
0.95
inately
0.94
inates
0.92
retali
0.81
dinand
0.80
retaliate
0.79
discriminatory
0.76
Activations Density 0.011%