INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     encouragement
    -0.07
     imposing
    -0.07
    .Col
    -0.07
     plug
    -0.06
     Ming
    -0.06
    -fed
    -0.06
     потому
    -0.06
     Finite
    -0.06
    ecessary
    -0.06
    合同
    -0.06
    POSITIVE LOGITS
    .wrapper
    0.07
    www
    0.07
    bbox
    0.07
     повіт
    0.06
     сопров
    0.06
    ucky
    0.06
     Jurassic
    0.06
    OCUMENT
    0.06
    (component
    0.06
    ै,
    0.06
    Act Density 0.002%

    No Known Activations