INDEX
Explanations
references to LGBTQ+ pride and activism
New Auto-Interp
Negative Logits
elo
-0.16
homosexuals
-0.15
aden
-0.15
olo
-0.14
aret
-0.14
olon
-0.14
etus
-0.14
Verd
-0.14
istrovstvÃŃ
-0.14
.Mask
-0.14
POSITIVE LOGITS
rights
0.25
pride
0.23
IQ
0.22
-rights
0.22
-friendly
0.20
0.20
Rights
0.20
_rights
0.20
community
0.18
Pride
0.18
Activations Density 0.024%