INDEX
    Explanations

    references to the LGBTQ+ community

    terms related to homosexuality

    New Auto-Interp
    Negative Logits
    utions
    -0.68
     Mehran
    -0.65
     shroud
    -0.65
     Weir
    -0.64
    çĦ
    -0.63
     Nile
    -0.61
     screws
    -0.61
     Spur
    -0.60
     fir
    -0.59
     negatives
    -0.59
    POSITIVE LOGITS
    emade
    1.68
    osexual
    1.55
    ework
    1.47
    estead
    1.46
    icide
    1.32
    eless
    1.29
    eland
    1.20
    eline
    1.06
    ogeneous
    1.06
    eworld
    1.05
    Act Density 0.023%

    No Known Activations