INDEX
    Explanations

    references to racial and socio-economic disparities

    New Auto-Interp
    Negative Logits
     avoid
    -0.19
    éģ¿
    -0.17
    Avoid
    -0.16
    olon
    -0.16
     Avoid
    -0.16
    avoid
    -0.15
    ازÙĩ
    -0.15
    ould
    -0.15
     prevent
    -0.14
    OLON
    -0.14
    POSITIVE LOGITS
     black
    0.51
     Black
    0.44
    black
    0.43
    Black
    0.41
    é»ij
    0.40
     BLACK
    0.39
     African
    0.37
    é»Ĵ
    0.37
    -black
    0.37
    BLACK
    0.34
    Act Density 0.145%

    No Known Activations