INDEX
Explanations
references to LGBTQ+ themes and terminology, particularly related to gay pride and rights
New Auto-Interp
Negative Logits
cker
-0.18
homosexuals
-0.16
inus
-0.15
_SECURITY
-0.15
gays
-0.15
ÑģÑĤин
-0.15
帯
-0.14
Instances
-0.14
bjerg
-0.14
istrovstvÃŃ
-0.14
POSITIVE LOGITS
rights
0.28
dar
0.28
-rights
0.28
bor
0.26
pride
0.24
-friendly
0.23
atri
0.23
lord
0.22
Rights
0.22
RIGHTS
0.22
Activations Density 0.025%