INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     props
    -0.08
    woods
    -0.08
    _seen
    -0.08
     செய்திகள்
    -0.08
    seeing
    -0.08
    வு
    -0.07
     mbi
    -0.07
     seen
    -0.07
     logo
    -0.07
     dealings
    -0.07
    POSITIVE LOGITS
     permanently
    0.10
    0.10
    永久
    0.10
     cruelty
    0.09
    0.09
     femin
    0.09
     prophyl
    0.09
     Perman
    0.08
     surg
    0.08
    -proof
    0.08
    Act Density 0.005%

    No Known Activations