INDEX
    Explanations

    references to death or injury incidents

    New Auto-Interp
    Negative Logits
    away
    -0.15
    apture
    -0.15
    AndWait
    -0.14
     ourselves
    -0.14
    евиÑĩ
    -0.14
    nection
    -0.14
     Vid
    -0.14
     nÄĥ
    -0.13
    vid
    -0.13
    Äģn
    -0.13
    POSITIVE LOGITS
    pek
    0.15
    bum
    0.14
    rouw
    0.14
    -peer
    0.14
    zcze
    0.14
     Destructor
    0.13
    modelName
    0.13
    ãĥ¼ãĤ¹ãĥĪ
    0.13
    clerosis
    0.13
    cline
    0.13
    Act Density 0.041%

    No Known Activations