INDEX
Explanations
instances of bias or prejudice in text
references to bias and prejudice in various contexts
New Auto-Interp
Negative Logits
tons
-0.74
esc
-0.72
zech
-0.70
export
-0.69
sie
-0.68
loo
-0.68
phan
-0.67
akura
-0.66
Money
-0.66
URE
-0.65
POSITIVE LOGITS
favoring
1.09
icial
1.02
towards
1.01
toward
1.01
bias
0.92
prejud
0.91
biases
0.90
against
0.86
detector
0.83
biased
0.79
Activations Density 0.042%