INDEX
    Explanations

    phrases or words that indicate something being disproportionately more or less than expected

    discussions of disproportionate impacts or effects on various groups

    New Auto-Interp
    Negative Logits
    ince
    -0.77
    ht
    -0.74
    ired
    -0.72
    uring
    -0.72
    held
    -0.71
    icist
    -0.70
    PT
    -0.70
    adal
    -0.70
    love
    -0.70
    ain
    -0.69
    POSITIVE LOGITS
     disproportionately
    1.07
     disproportion
    1.04
     ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
    0.83
     disadvant
    0.81
     impacts
    0.81
    ãĤ¼ãĤ¦ãĤ¹
    0.80
     disadvantages
    0.79
     proport
    0.78
     adolesc
    0.78
     shenan
    0.76
    Act Density 0.014%

    No Known Activations