INDEX
Explanations
mentions of LGBTQ organizations or related terms
references to the queer community and related terminology
New Auto-Interp
Negative Logits
ãĥŁ
-0.74
GOODMAN
-0.71
ãĥ¼ãĥĨãĤ£
-0.69
é¾įå
-0.68
batting
-0.66
ERSON
-0.65
ãĤ¼ãĤ¦ãĤ¹
-0.65
uania
-0.65
eleph
-0.65
McDonnell
-0.64
POSITIVE LOGITS
zon
1.14
Que
1.00
erness
0.99
que
0.98
Que
0.97
ues
0.94
bec
0.90
eg
0.88
edo
0.85
ue
0.84
Activations Density 0.007%