INDEX
Explanations
mentions of sexual orientation and gender identity in the context of societal discussions
New Auto-Interp
Negative Logits
keley
-0.15
uien
-0.14
olon
-0.14
èĦ
-0.14
ieres
-0.14
ladu
-0.14
bert
-0.14
decess
-0.14
slut
-0.13
ibre
-0.13
POSITIVE LOGITS
same
0.30
orientation
0.29
hom
0.28
homosexuality
0.27
Hom
0.27
Same
0.27
sod
0.27
same
0.26
åIJĮ
0.26
Orientation
0.25
Activations Density 0.065%