INDEX
    Explanations

    discriminatory language related to sexual orientation, gender identity, and civil rights

    New Auto-Interp
    Negative Logits
     fays
    -0.72
     feen
    -0.70
     endom
    -0.70
     Juf
    -0.67
     fign
    -0.66
     Pfal
    -0.63
     Dés
    -0.63
     fua
    -0.62
     fince
    -0.62
     sonne
    -0.61
    POSITIVE LOGITS
    <bos>
    0.64
    SneakyThrows
    0.62
    nationality
    0.56
    niająca
    0.53
    üedad
    0.52
    KELEY
    0.49
    Fitment
    0.48
     agences
    0.48
    pañas
    0.48
     Walkover
    0.47
    Act Density 0.291%

    No Known Activations