INDEX
    Explanations

    words and phrases that express vulgarity or frustration

    New Auto-Interp
    Negative Logits
    elle
    -0.20
    dl
    -0.19
    elas
    -0.17
    elli
    -0.17
    el
    -0.17
    ements
    -0.16
    ellers
    -0.16
    ess
    -0.16
    lite
    -0.16
    eler
    -0.16
    POSITIVE LOGITS
    sterol
    0.19
    ucid
    0.17
    prit
    0.17
    heck
    0.16
    iferay
    0.16
    inese
    0.16
    unteer
    0.16
    abyrin
    0.16
    itude
    0.15
    mazon
    0.15
    Act Density 0.060%

    No Known Activations