INDEX
Explanations
terms related to LGBTQ identities and issues, particularly focusing on the word "gay."
New Auto-Interp
Negative Logits
laps
-0.17
.scalablytyped
-0.17
uzey
-0.16
anmar
-0.15
ngr
-0.15
incinn
-0.15
gaard
-0.14
sel
-0.14
inous
-0.14
«ĺ
-0.14
POSITIVE LOGITS
-friendly
0.16
ness
0.16
ÙĤب
0.15
aku
0.15
uche
0.14
friendly
0.14
toupper
0.14
opoly
0.14
/trans
0.13
Kaplan
0.13
Activations Density 0.022%