INDEX
    Explanations

    references to LGBTQ+ topics

    terms related to gay identity and issues

    New Auto-Interp
    Negative Logits
    urers
    -0.76
    è¦ļéĨĴ
    -0.71
    âĵĺ
    -0.68
    arily
    -0.67
    Condition
    -0.67
    Reviewer
    -0.66
    Dur
    -0.65
    )=(
    -0.64
    PsyNetMessage
    -0.62
    effective
    -0.61
    POSITIVE LOGITS
     marriage
    1.00
    dar
    0.97
    atri
    0.95
    lord
    0.93
    glers
    0.93
     pride
    0.91
     porn
    0.90
     couples
    0.88
     rights
    0.88
     slurs
    0.87
    Act Density 0.024%

    No Known Activations