INDEX
Explanations
references to LGBTQ+ topics
terms related to gay identity and issues
New Auto-Interp
Negative Logits
urers
-0.76
è¦ļéĨĴ
-0.71
âĵĺ
-0.68
arily
-0.67
Condition
-0.67
Reviewer
-0.66
Dur
-0.65
)=(
-0.64
PsyNetMessage
-0.62
effective
-0.61
POSITIVE LOGITS
marriage
1.00
dar
0.97
atri
0.95
lord
0.93
glers
0.93
pride
0.91
porn
0.90
couples
0.88
rights
0.88
slurs
0.87
Activations Density 0.024%