INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    wayi
    -0.09
    ечь
    -0.07
    -0.07
    หน
    -0.07
     apar
    -0.07
     root
    -0.07
     skip
    -0.07
    куда
    -0.07
    dade
    -0.07
    ڈ
    -0.07
    POSITIVE LOGITS
     unequiv
    0.11
    回应
    0.10
     unsolicited
    0.10
     announces
    0.08
     narrowly
    0.08
    宣布
    0.08
     manifesto
    0.08
    修改
    0.08
     reaffirm
    0.08
     definit
    0.08
    Act Density 0.006%

    No Known Activations