INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Reeves
    -0.07
    estead
    -0.07
     restitution
    -0.07
    .Reference
    -0.07
     satisfactory
    -0.07
     opráv
    -0.06
     Bunun
    -0.06
     انت
    -0.06
     být
    -0.06
    إن
    -0.06
    POSITIVE LOGITS
     opposite
    0.10
    //}↵
    0.07
    (nome
    0.06
     assume
    0.06
     plug
    0.06
    _publish
    0.06
     //}↵
    0.06
     unittest
    0.06
    .tom
    0.06
    coh
    0.06
    Act Density 0.008%

    No Known Activations