INDEX
    Explanations

    references to same-sex relationships and sexual orientation

    New Auto-Interp
    Negative Logits
    .scalablytyped
    -0.17
    openh
    -0.15
    erek
    -0.15
    quares
    -0.14
     licensors
    -0.14
    .wp
    -0.14
    AndServe
    -0.14
    ãĥ¯ãĥ¼
    -0.14
    è£ľ
    -0.14
    raquo
    -0.13
    POSITIVE LOGITS
    antan
    0.17
    onas
    0.15
     Sands
    0.15
    arian
    0.15
    vid
    0.15
    asing
    0.14
    fv
    0.14
    nev
    0.14
     parser
    0.14
    wick
    0.14
    Act Density 0.005%

    No Known Activations