INDEX
Explanations
phrases related to political criticism and opposition
references to political controversy and public outrage
New Auto-Interp
Negative Logits
prepar
-0.71
eday
-0.71
[|
-0.70
depended
-0.68
~/
-0.68
MAP
-0.67
Intermediate
-0.66
000
-0.64
[(
-0.64
8000
-0.64
POSITIVE LOGITS
misogyny
1.51
sexism
1.47
sexist
1.43
misogyn
1.39
homophobia
1.36
homophobic
1.33
scandals
1.27
bigotry
1.27
hypocrisy
1.24
racism
1.22
Activations Density 1.144%