INDEX
    Explanations

    references to issues of race, gender, and social equity

    New Auto-Interp
    Negative Logits
    amba
    -0.15
    éĩ
    -0.14
    rey
    -0.14
     ActionTypes
    -0.14
    _usec
    -0.13
    evin
    -0.13
    ourke
    -0.13
     underrated
    -0.13
    inan
    -0.13
     sag
    -0.13
    POSITIVE LOGITS
     white
    0.34
    white
    0.29
    -white
    0.29
    çϽ
    0.28
     White
    0.26
     WHITE
    0.26
    _white
    0.26
     çϽ
    0.26
    White
    0.25
     whites
    0.25
    Act Density 0.166%

    No Known Activations