INDEX
Explanations
references to the LGBTQ+ community, with a specific focus on gay individuals
references to gay identity and culture
New Auto-Interp
Negative Logits
Condition
-0.76
è¦ļéĨĴ
-0.75
ufact
-0.74
æ©Ł
-0.73
hower
-0.73
Manufacturer
-0.71
guiActiveUnfocused
-0.70
âĵĺ
-0.70
ç«
-0.70
sidx
-0.69
POSITIVE LOGITS
atri
1.02
dar
0.93
couples
0.87
bie
0.87
lord
0.85
glers
0.84
marriage
0.83
ening
0.82
bies
0.82
ened
0.80
Activations Density 0.019%