INDEX
Explanations
references to gender identity and related controversies
New Auto-Interp
Negative Logits
angs
-0.17
ius
-0.15
aran
-0.15
coc
-0.14
Bai
-0.14
AccessType
-0.14
opia
-0.14
lÃłnh
-0.14
offsetof
-0.14
æĻ
-0.14
POSITIVE LOGITS
transgender
0.22
-binary
0.18
vag
0.18
binary
0.17
cis
0.17
Binary
0.17
Binary
0.17
biological
0.17
trans
0.17
ermo
0.17
Activations Density 0.059%