INDEX
    Explanations

    references to inclusive language regarding people

    New Auto-Interp
    Negative Logits
    ively
    -0.16
    ibe
    -0.16
    ible
    -0.15
    ibur
    -0.14
    ickerView
    -0.14
    gren
    -0.14
    undle
    -0.14
     everlasting
    -0.14
    ibly
    -0.14
    IBE
    -0.14
    POSITIVE LOGITS
     else
    0.21
    onymous
    0.17
    алов
    0.16
    _else
    0.15
    adesh
    0.14
    orton
    0.14
    ëĬ¥
    0.14
    кид
    0.14
    ultipart
    0.14
    تÙĤ
    0.14
    Act Density 0.018%

    No Known Activations