INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     privilege
    -0.08
     pile
    -0.08
     publisher
    -0.07
     piles
    -0.07
     हित
    -0.07
     intensiv
    -0.07
     Klein
    -0.07
     mir
    -0.07
     moed
    -0.07
     lamp
    -0.07
    POSITIVE LOGITS
    .Raw
    0.08
    IMP
    0.08
    _delta
    0.07
    Kos
    0.07
     Score
    0.07
    .impl
    0.07
     truc
    0.07
    OOSE
    0.07
     compartilhar
    0.07
    评分
    0.07
    Act Density 0.017%

    No Known Activations