INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Delete
    -0.07
     فیلم
    -0.07
    同じ
    -0.07
     emiss
    -0.07
    owl
    -0.07
     student
    -0.07
    (first
    -0.07
     diver
    -0.06
    /loading
    -0.06
     Heavy
    -0.06
    POSITIVE LOGITS
    .Txt
    0.07
     sensed
    0.07
     ipv
    0.06
     TASK
    0.06
    ٨
    0.06
    Й
    0.06
    728
    0.06
    orestation
    0.06
     تقد
    0.06
     ESV
    0.06
    Act Density 0.009%

    No Known Activations