INDEX
Explanations
phrases related to disrespect, insult, or humiliation towards individuals or groups
instances of disrespect or hostility towards individuals or groups
New Auto-Interp
Negative Logits
minster
-0.71
forward
-0.70
kj
-0.69
ogg
-0.69
HEAD
-0.66
combe
-0.65
soDeliveryDate
-0.65
atonin
-0.64
aunder
-0.64
Netflix
-0.63
POSITIVE LOGITS
gays
0.95
homosexuals
0.93
minorities
0.93
Mexicans
0.89
Muslims
0.85
foreigners
0.85
humanity
0.85
Hispanics
0.84
Arabs
0.84
anyone
0.83
Activations Density 0.259%