INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     picked
    -0.07
     memberships
    -0.07
    ectomy
    -0.07
    Width
    -0.06
     millet
    -0.06
     Optimization
    -0.06
    ynomials
    -0.06
     kernels
    -0.06
     kern
    -0.06
     Density
    -0.06
    POSITIVE LOGITS
     despite
    0.11
    Despite
    0.08
     Despite
    0.08
     spite
    0.07
     trag
    0.07
     trotz
    0.07
     тяж
    0.07
     lungs
    0.06
     dozens
    0.06
     undeniable
    0.06
    Act Density 0.008%

    No Known Activations