INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     آزاد
    -0.06
    weeney
    -0.06
    _HT
    -0.06
    ísk
    -0.06
     tekst
    -0.06
    Tem
    -0.06
    Word
    -0.06
    DOG
    -0.06
     dan
    -0.06
    صه
    -0.06
    POSITIVE LOGITS
    .email
    0.07
     smash
    0.06
    irthday
    0.06
    splice
    0.06
    /Common
    0.06
     upsetting
    0.06
    0.06
     mixin
    0.06
     fragment
    0.06
    λικ
    0.06
    Act Density 0.090%

    No Known Activations