INDEX
    Explanations

    abuse, harassment

    New Auto-Interp
    Negative Logits
    ونت
    -0.06
     bx
    -0.06
     гум
    -0.06
    -0.06
    Coordinator
    -0.06
    Cole
    -0.06
     tịch
    -0.06
    osals
    -0.06
     Nietzsche
    -0.06
     Lig
    -0.05
    POSITIVE LOGITS
     travail
    0.07
    isNew
    0.07
    der
    0.06
     Toledo
    0.06
     Hospital
    0.06
     tornado
    0.06
     Crusher
    0.06
     operate
    0.06
     AFTER
    0.06
    restaurants
    0.06
    Act Density 0.012%

    No Known Activations