INDEX
    Explanations

    terms related to gender and sexual identity

    New Auto-Interp
    Negative Logits
    ÏĢον
    -0.09
    reo
    -0.08
    ious
    -0.07
    apter
    -0.07
     evet
    -0.07
    rias
    -0.07
    Gratis
    -0.07
    istica
    -0.07
    VOKE
    -0.07
    plies
    -0.07
    POSITIVE LOGITS
    uchen
    0.06
    or
    0.06
    ones
    0.06
    aha
    0.06
    orge
    0.06
     Johns
    0.05
    orr
    0.05
    102
    0.05
    BA
    0.05
     equip
    0.05
    Act Density 0.002%

    No Known Activations