INDEX
Explanations
references to civil rights organizations and social justice advocacy groups
New Auto-Interp
Negative Logits
elib
-0.16
orb
-0.15
bilt
-0.15
олÑı
-0.15
forme
-0.15
lef
-0.14
dst
-0.14
las
-0.14
fenced
-0.14
rok
-0.14
POSITIVE LOGITS
oyer
0.22
ambi
0.15
rier
0.15
astos
0.15
ayi
0.15
Person
0.15
bare
0.14
thora
0.14
ERA
0.14
uggage
0.14
Activations Density 0.019%