INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     г
    -0.06
    nač
    -0.06
     fwd
    -0.06
     Official
    -0.06
     رای
    -0.06
     نزدیک
    -0.06
     el
    -0.06
     Phrase
    -0.06
     Роб
    -0.06
    DIS
    -0.06
    POSITIVE LOGITS
     need
    0.07
    inand
    0.06
    (high
    0.06
    願い
    0.06
     Rosie
    0.06
    بية
    0.06
    family
    0.06
     appreciate
    0.06
    gba
    0.06
     Draco
    0.06
    Act Density 0.025%

    No Known Activations