INDEX
    Explanations

    phrases indicating critiques or negative assessments of systems and organizations

    New Auto-Interp
    Negative Logits
     Marker
    -0.17
    رد
    -0.16
    sh
    -0.16
    gons
    -0.15
     Burnett
    -0.15
    ot
    -0.14
     wel
    -0.14
    AUTH
    -0.14
    ertos
    -0.14
     DÄĽ
    -0.14
    POSITIVE LOGITS
     undo
    0.20
    eday
    0.17
    344
    0.15
    164
    0.15
    arse
    0.15
    jsonp
    0.15
    å±Ģ
    0.14
     karak
    0.14
    /modal
    0.14
    zek
    0.14
    Act Density 0.103%

    No Known Activations