INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     their
    -0.08
     thiên
    -0.06
     Ye
    -0.06
     Yan
    -0.06
    -0.06
    -0.06
    -0.06
     Є
    -0.06
    ايات
    -0.06
     Neu
    -0.06
    POSITIVE LOGITS
    ournament
    0.07
    trim
    0.07
     awful
    0.07
    oped
    0.06
     plugin
    0.06
    .addColumn
    0.06
    Pokemon
    0.06
     dof
    0.06
    Classifier
    0.06
    roducing
    0.06
    Act Density 0.015%

    No Known Activations