INDEX
    Explanations

    discussions surrounding societal norms and injustices related to gender and race

    New Auto-Interp
    Negative Logits
    illow
    -0.16
    boxed
    -0.15
    urve
    -0.14
    ÏĢη
    -0.14
    Interval
    -0.14
    ILER
    -0.14
    OTTOM
    -0.14
    .bootstrap
    -0.13
     Interval
    -0.13
    olie
    -0.13
    POSITIVE LOGITS
    baum
    0.17
    aldi
    0.16
    ska
    0.16
     PRI
    0.15
     Celt
    0.15
     attr
    0.15
    angan
    0.14
    éĢ
    0.14
    Ïģεια
    0.14
     own
    0.13
    Act Density 0.253%

    No Known Activations