INDEX
    Explanations

    references to social issues and marginalized communities

    New Auto-Interp
    Negative Logits
    uddy
    -0.15
     stupidity
    -0.14
    EMS
    -0.14
    ลาà¸Ķ
    -0.14
    icide
    -0.14
    egot
    -0.14
    afil
    -0.13
    _LA
    -0.13
    à¹īà¸Ńà¸Ļ
    -0.13
    apan
    -0.13
    POSITIVE LOGITS
     without
    0.28
     cut
    0.28
     left
    0.27
     denied
    0.27
     excluded
    0.25
     disen
    0.24
     shut
    0.23
     isolated
    0.23
     discrim
    0.23
     effectively
    0.23
    Act Density 0.137%

    No Known Activations