INDEX
    Explanations

    references to gender identity and expressions related to faith and societal norms

    New Auto-Interp
    Negative Logits
    znik
    -0.14
    acaÄŁ
    -0.13
    ichten
    -0.13
    reck
    -0.13
     LIABLE
    -0.13
    quit
    -0.12
    innitus
    -0.12
    íĴĪ
    -0.12
     domest
    -0.12
     åĿ
    -0.12
    POSITIVE LOGITS
     gender
    0.42
     Gender
    0.38
     transgender
    0.38
     genders
    0.34
    Gender
    0.34
    gender
    0.34
     genitals
    0.31
     sex
    0.31
     genital
    0.30
     Assigned
    0.30
    Act Density 0.070%

    No Known Activations