INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     resisting
    -0.06
    .Restrict
    -0.06
    ضافة
    -0.06
     Hd
    -0.06
     KO
    -0.06
    Flutter
    -0.06
     admired
    -0.06
     ساعت
    -0.06
    ちょ
    -0.06
     krát
    -0.06
    POSITIVE LOGITS
     Вар
    0.07
     groundwork
    0.06
     athleticism
    0.06
    ,max
    0.06
    -sub
    0.06
     Dick
    0.06
     اسلامی
    0.06
     PAR
    0.06
    ção
    0.06
     epit
    0.06
    Act Density 0.208%

    No Known Activations