INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     مربوط
    -0.07
    Partner
    -0.07
     hod
    -0.07
     کد
    -0.07
     пот
    -0.07
     Nobody
    -0.06
    .dp
    -0.06
     tert
    -0.06
     теч
    -0.06
     tul
    -0.06
    POSITIVE LOGITS
     scholars
    0.06
    rete
    0.06
    cpt
    0.06
     considered
    0.06
    ورن
    0.06
     darkest
    0.06
    xfc
    0.06
     inning
    0.05
    .socket
    0.05
    'e
    0.05
    Act Density 0.002%

    No Known Activations