INDEX
Explanations
references to the LGBTQ+ community
terms related to homosexuality
New Auto-Interp
Negative Logits
utions
-0.68
Mehran
-0.65
shroud
-0.65
Weir
-0.64
çĦ
-0.63
Nile
-0.61
screws
-0.61
Spur
-0.60
fir
-0.59
negatives
-0.59
POSITIVE LOGITS
emade
1.68
osexual
1.55
ework
1.47
estead
1.46
icide
1.32
eless
1.29
eland
1.20
eline
1.06
ogeneous
1.06
eworld
1.05
Activations Density 0.023%