INDEX
Explanations
terms related to negative social attitudes and discrimination, particularly homophobia
references to homophobia and related themes
New Auto-Interp
Negative Logits
coat
-0.79
thin
-0.75
Solitaire
-0.73
Dur
-0.73
ding
-0.71
Path
-0.69
MER
-0.69
Weaver
-0.68
hig
-0.67
vier
-0.67
POSITIVE LOGITS
homophobic
1.24
homophobia
0.99
slurs
0.85
gay
0.78
yip
0.77
barr
0.76
sexist
0.75
gay
0.74
prejudice
0.73
queer
0.72
Activations Density 0.012%