INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ray
    -0.07
    prak
    -0.07
    Lee
    -0.06
    غن
    -0.06
    (ne
    -0.06
     США
    -0.06
    -0.06
     Zuk
    -0.06
     begr
    -0.06
     Ves
    -0.06
    POSITIVE LOGITS
    0.07
    通過
    0.07
    0.06
    就是
    0.06
     Cute
    0.06
     สถาน
    0.06
     abortion
    0.06
    0.06
     gaan
    0.06
     oluştur
    0.06
    Act Density 0.003%

    No Known Activations