INDEX
    Explanations

    phrases related to disproportional distributions or impacts

    terms related to disproportionate impacts or inequalities affecting specific groups

    New Auto-Interp
    Negative Logits
    shire
    -0.79
    hus
    -0.79
    ince
    -0.77
    love
    -0.75
    uring
    -0.75
    icist
    -0.74
    adal
    -0.74
    ode
    -0.73
    sein
    -0.73
    icism
    -0.73
    POSITIVE LOGITS
     disproportion
    0.98
     disproportionately
    0.91
     disadvant
    0.84
     amounts
    0.78
     disadvantage
    0.77
     ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
    0.75
     impacts
    0.75
    pling
    0.75
     representation
    0.75
     disadvantages
    0.74
    Act Density 0.022%

    No Known Activations