INDEX
Explanations
references to marginalized or vulnerable social groups
New Auto-Interp
Negative Logits
CTYPE
-0.15
βά
-0.15
clas
-0.15
Inflate
-0.14
uzzi
-0.14
¼åIJĪ
-0.14
&type
-0.14
éĻ£
-0.14
Dunn
-0.14
axe
-0.14
POSITIVE LOGITS
477
0.17
/tiny
0.15
sector
0.15
population
0.15
visibility
0.15
:numel
0.15
772
0.15
special
0.14
exist
0.14
group
0.14
Activations Density 0.008%