INDEX
    Explanations

    derogatory or obscene language

    New Auto-Interp
    Negative Logits
    sou
    -0.17
    ists
    -0.16
    ign
    -0.16
    borg
    -0.14
    onder
    -0.14
    imers
    -0.14
    argins
    -0.14
    803
    -0.14
    iom
    -0.14
    zet
    -0.14
    POSITIVE LOGITS
    assic
    0.15
    IFO
    0.15
     anale
    0.14
    ãĦ
    0.14
    rgan
    0.14
    nop
    0.14
    gis
    0.14
    PTS
    0.14
    aukee
    0.13
    anske
    0.13
    Act Density 0.060%

    No Known Activations