INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Static
    -0.07
     خورد
    -0.07
    ^[
    -0.06
    看着
    -0.06
    erving
    -0.06
     spell
    -0.06
    ического
    -0.06
     fasting
    -0.06
     sparking
    -0.06
    应该
    -0.06
    POSITIVE LOGITS
    (wait
    0.07
     شرح
    0.06
     citations
    0.06
    .Patient
    0.06
     tỉ
    0.06
    (filters
    0.06
     мяг
    0.06
    _dates
    0.06
     моря
    0.06
     Bron
    0.06
    Act Density 0.031%

    No Known Activations