INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    impact
    -0.08
    ategori
    -0.08
    upi
    -0.08
    ummar
    -0.08
    lider
    -0.08
    waż
    -0.08
     MARK
    -0.07
     readership
    -0.07
    pes
    -0.07
     Forum
    -0.07
    POSITIVE LOGITS
     underestimate
    0.08
     minic
    0.08
     காவ
    0.08
    0.08
    ವೂ
    0.07
    :true
    0.07
     ngon
    0.07
     langen
    0.07
     zvakare
    0.07
     forb
    0.07
    Act Density 0.003%

    No Known Activations