INDEX
Explanations
discussions surrounding societal norms and injustices related to gender and race
New Auto-Interp
Negative Logits
illow
-0.16
boxed
-0.15
urve
-0.14
ÏĢη
-0.14
Interval
-0.14
ILER
-0.14
OTTOM
-0.14
.bootstrap
-0.13
Interval
-0.13
olie
-0.13
POSITIVE LOGITS
baum
0.17
aldi
0.16
ska
0.16
PRI
0.15
Celt
0.15
attr
0.15
angan
0.14
éĢ
0.14
Ïģεια
0.14
own
0.13
Activations Density 0.253%