INDEX
Explanations
terms related to gender and sexual identity
New Auto-Interp
Negative Logits
homosexuals
-0.19
çªģ
-0.16
.urlopen
-0.14
ilio
-0.14
homosexuality
-0.14
.isUser
-0.14
gays
-0.14
Pig
-0.13
mere
-0.13
IO
-0.13
POSITIVE LOGITS
trans
0.30
Trans
0.24
Trans
0.23
cis
0.23
Bis
0.23
trans
0.22
fluid
0.20
cis
0.20
pan
0.20
questioning
0.19
Activations Density 0.058%