INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    reate
    -0.07
     thấp
    -0.07
    -0.07
     behaving
    -0.06
     bipartisan
    -0.06
    macı
    -0.06
     defiant
    -0.06
    rozen
    -0.06
    cerr
    -0.06
     LNG
    -0.06
    POSITIVE LOGITS
     هواپیم
    0.07
    xffffff
    0.07
    (power
    0.06
    863
    0.06
     sadece
    0.06
     Hind
    0.06
     yoktur
    0.06
     lưu
    0.06
     examiner
    0.06
    =find
    0.06
    Act Density 0.000%

    No Known Activations