INDEX
Explanations
negative myths and stereotypes about various subjects like gender, sexuality, and physical appearance
references to myths and stereotypes, particularly those that have negative effects on individuals or groups
New Auto-Interp
Negative Logits
ptroller
-0.84
andestine
-0.76
shall
-0.76
etsk
-0.74
issions
-0.72
Delivery
-0.72
assador
-0.70
itness
-0.70
uddin
-0.69
otos
-0.69
POSITIVE LOGITS
stereotypes
1.24
stereotype
1.19
stereotyp
1.17
notions
1.09
ingrained
1.06
biases
1.03
prejudice
1.02
prejudices
0.99
stigma
0.98
perpet
0.95
Activations Density 0.320%