INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .df
    -0.07
     welfare
    -0.06
    quivos
    -0.06
    she
    -0.06
    .children
    -0.06
     looph
    -0.06
    policy
    -0.06
    NW
    -0.06
     planning
    -0.06
    _SIM
    -0.06
    POSITIVE LOGITS
     أيض
    0.07
     prote
    0.07
    Inlining
    0.06
     Deleting
    0.06
     capturing
    0.06
     singer
    0.06
     Electronic
    0.06
    Assert
    0.06
     esp
    0.06
     اصفهان
    0.06
    Act Density 0.004%

    No Known Activations