INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     wsz
    -0.07
    Zen
    -0.06
    атков
    -0.06
     nord
    -0.06
     सर
    -0.06
     เด
    -0.06
    Naz
    -0.06
    -0.06
     россий
    -0.06
     getToken
    -0.06
    POSITIVE LOGITS
    faces
    0.08
    UCT
    0.07
     Crimes
    0.07
     history
    0.06
    Everybody
    0.06
     History
    0.06
    rar
    0.06
     مقاله
    0.06
    activity
    0.06
     Death
    0.06
    Act Density 0.002%

    No Known Activations