INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Mana
    -0.07
     tow
    -0.07
     factor
    -0.07
    -On
    -0.07
     useCallback
    -0.07
     ile
    -0.07
    _translate
    -0.07
    igraph
    -0.06
    ­s
    -0.06
     ראשון
    -0.06
    POSITIVE LOGITS
     profits
    0.07
     lợi
    0.07
    0.07
    kins
    0.06
     poster
    0.06
     accumulating
    0.06
     designing
    0.06
    渔业
    0.06
     Potter
    0.06
    references
    0.06
    Act Density 0.000%

    No Known Activations