INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     trapping
    -0.07
    _n
    -0.07
     voluntarily
    -0.06
     ориг
    -0.06
     questioning
    -0.06
    sap
    -0.06
     Closing
    -0.06
    .outputs
    -0.06
    esát
    -0.06
     thro
    -0.06
    POSITIVE LOGITS
    forcement
    0.07
     Send
    0.06
    Kyle
    0.06
     Nath
    0.06
    je
    0.06
    your
    0.06
     вим
    0.06
    ме
    0.06
    .property
    0.06
    ें।
    0.06
    Act Density 0.000%

    No Known Activations