INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ność
    -0.10
    quired
    -0.08
    ngor
    -0.07
    工作人员
    -0.07
    _boolean
    -0.07
    ffected
    -0.07
     يؤدي
    -0.07
    sid
    -0.07
    -0.07
    -0.07
    POSITIVE LOGITS
    ông
    0.09
     choose
    0.08
     wisely
    0.08
     muốn
    0.08
     piece
    0.08
     conceived
    0.08
     nội
    0.08
     pilihan
    0.08
     Choose
    0.08
     hmm
    0.08
    Act Density 0.046%

    No Known Activations