INDEX
    Explanations

    terms related to sexual orientation, sexual harassment, sexual violence, and discrimination

    New Auto-Interp
    Negative Logits
    <bos>
    -3.20
    -0.85
    /**
    -0.84
    /*---
    -0.80
    /*++
    -0.74
    <?
    -0.74
     do
    -0.72
     raise
    -0.69
    addCriterion
    -0.67
    SourceChecksum
    -0.67
    POSITIVE LOGITS
     ftu
    1.82
     sovere
    1.78
     stockholm
    1.75
     Juf
    1.75
     fta
    1.72
     Augu
    1.70
     thut
    1.66
     Intere
    1.62
     disagre
    1.60
     increa
    1.60
    Act Density 0.091%

    No Known Activations