INDEX
    Explanations

    references to gender identity, sexual orientation, and related education topics

    New Auto-Interp
    Negative Logits
    eyn
    -0.16
     Sud
    -0.14
     sud
    -0.14
    _splits
    -0.14
    .masks
    -0.14
    upos
    -0.14
     speculative
    -0.14
    Sibling
    -0.14
    igg
    -0.14
    inals
    -0.14
    POSITIVE LOGITS
     sex
    0.68
     Sex
    0.54
    sex
    0.50
     SEX
    0.48
    -sex
    0.48
    Sex
    0.47
    _sex
    0.44
    .sex
    0.43
     sexual
    0.38
    SEX
    0.38
    Act Density 0.085%

    No Known Activations