INDEX
Explanations
references to a specific demographic group
references to privilege associated with race
New Auto-Interp
Negative Logits
mortg
-0.76
polls
-0.73
democratically
-0.68
filibuster
-0.65
mathemat
-0.65
tariff
-0.63
indu
-0.62
contrace
-0.62
widgets
-0.62
mortgages
-0.61
POSITIVE LOGITS
msec
0.76
itans
0.68
interstitial
0.67
INGTON
0.67
YING
0.67
GoldMagikarp
0.67
toggle
0.66
cca
0.66
ornings
0.65
Spect
0.64
Activations Density 0.000%