INDEX
Explanations
mentions or references to LGBTQ+ individuals or topics
mentions of the word "gay" in relation to rights and equality
New Auto-Interp
Negative Logits
âĵĺ
-0.81
KT
-0.80
Condition
-0.76
SEC
-0.72
Ct
-0.71
Reviewer
-0.71
ç«
-0.69
ICS
-0.69
nit
-0.68
ioxide
-0.67
POSITIVE LOGITS
equality
1.05
gay
0.92
bisexual
0.91
atri
0.89
glers
0.87
gays
0.85
Equality
0.85
sex
0.84
dar
0.82
couples
0.82
Activations Density 0.017%