INDEX
Explanations
terms and phrases related to the LGBT community and identities
New Auto-Interp
Head Attr Weights
0:0.04
1:0.04
2:0.08
3:0.32
4:0.02
5:0.12
6:0.02
7:0.08
8:0.04
9:0.02
10:0.12
11:0.05
Negative Logits
warehouses
-2.00
receipts
-1.96
defenses
-1.92
coffers
-1.84
treasures
-1.80
kitchens
-1.79
greatness
-1.74
ceilings
-1.73
ngth
-1.72
marqu
-1.71
POSITIVE LOGITS
Female
2.23
female
2.20
lesbian
2.09
emale
2.06
binary
1.97
cycle
1.94
ouple
1.93
omen
1.92
itizen
1.91
Male
1.88
Activations Density 0.002%