INDEX
    Explanations

    phrases related to potential consequences faced by individuals or groups

    issues related to facing challenges or penalties

    New Auto-Interp
    Negative Logits
    player
    -0.75
    players
    -0.74
    meta
    -0.69
     Users
    -0.67
    atell
    -0.65
     Cth
    -0.64
     Humans
    -0.64
    REDACTED
    -0.63
    sama
    -0.63
    alian
    -0.62
    POSITIVE LOGITS
     preferential
    1.09
     protections
    1.00
     protection
    0.95
     deportation
    0.94
     refunds
    0.94
     treatment
    0.94
     disproportionately
    0.94
     undue
    0.93
     disproportionate
    0.90
     brunt
    0.89
    Act Density 0.524%

    No Known Activations