INDEX
Explanations
mentions of gender identity and sexual orientation, particularly related to women and the LGBTQ+ community
New Auto-Interp
Negative Logits
jom
-0.15
Habit
-0.15
pcf
-0.14
åıİ
-0.14
homosexuals
-0.14
Habitat
-0.14
ibu
-0.13
zioni
-0.13
-gradient
-0.13
깨
-0.13
POSITIVE LOGITS
identifying
0.30
identification
0.28
identify
0.28
identity
0.28
identities
0.26
Identification
0.26
ident
0.26
identifies
0.26
identity
0.25
Identify
0.25
Activations Density 0.117%