INDEX
    Explanations

    derogatory or offensive language

    New Auto-Interp
    Negative Logits
    ArgsConstructor
    -0.58
     modernize
    -0.55
    VIAF
    -0.55
     ویکی‌پدیای
    -0.54
     Ders
    -0.53
    原始内容存档于
    -0.52
    PROBE
    -0.51
    RTGC
    -0.51
    بوابة
    -0.51
    Tall
    -0.49
    POSITIVE LOGITS
     shit
    1.97
     crap
    1.72
     SHIT
    1.71
     Shit
    1.68
    shit
    1.66
    Shit
    1.61
     shite
    1.45
     shits
    1.45
    crap
    1.31
    Crap
    1.25
    Act Density 0.381%

    No Known Activations