INDEX
Explanations
phrases related to potential consequences faced by individuals or groups
issues related to facing challenges or penalties
New Auto-Interp
Negative Logits
player
-0.75
players
-0.74
meta
-0.69
Users
-0.67
atell
-0.65
Cth
-0.64
Humans
-0.64
REDACTED
-0.63
sama
-0.63
alian
-0.62
POSITIVE LOGITS
preferential
1.09
protections
1.00
protection
0.95
deportation
0.94
refunds
0.94
treatment
0.94
disproportionately
0.94
undue
0.93
disproportionate
0.90
brunt
0.89
Activations Density 0.524%