INDEX
Explanations
verbs related to negative characterization or treatment of individuals
terms related to the marginalization and denigration of individuals or groups
New Auto-Interp
Negative Logits
venture
-0.69
horizont
-0.68
arrang
-0.67
querque
-0.66
uid
-0.66
shake
-0.65
quart
-0.65
vent
-0.65
chart
-0.64
aldo
-0.64
POSITIVE LOGITS
slurs
0.76
stereotypes
0.76
insults
0.74
enance
0.72
caricature
0.72
Enemy
0.71
bullies
0.70
homosexuals
0.69
foreigners
0.69
Koreans
0.68
Activations Density 0.123%