INDEX
Explanations
terms related to racism
discourse related to racism
New Auto-Interp
Negative Logits
TAIN
-0.82
ITNESS
-0.67
arters
-0.63
cise
-0.61
Expend
-0.61
OPER
-0.60
RPM
-0.60
abort
-0.59
icles
-0.59
rpm
-0.59
POSITIVE LOGITS
prejudice
1.05
ophobia
0.92
ophobic
0.90
perv
0.84
itism
0.81
rife
0.79
worsened
0.76
prejud
0.76
rampant
0.75
racism
0.75
Activations Density 0.039%