INDEX
Explanations
phrases related to racial considerations in political contexts
New Auto-Interp
Head Attr Weights
0:0.07
1:0.03
2:0.02
3:0.09
4:0.05
5:0.07
6:0.04
7:0.03
8:0.43
9:0.06
10:0.02
11:0.03
Negative Logits
��
-2.00
lisher
-1.87
��
-1.84
��極
-1.81
cedented
-1.67
breakthrough
-1.64
Roose
-1.58
eks
-1.58
ende
-1.57
outputs
-1.55
POSITIVE LOGITS
lane
1.85
Against
1.82
ethnicity
1.79
wcsstore
1.70
against
1.66
mattered
1.65
vanquished
1.63
="#
1.61
innocence
1.60
jurors
1.60
Activations Density 0.003%