INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     anlaş
    -0.07
    (dw
    -0.07
     UDP
    -0.07
     london
    -0.06
     fuel
    -0.06
    ันย
    -0.06
     differed
    -0.06
    white
    -0.06
     ц
    -0.06
    Messenger
    -0.06
    POSITIVE LOGITS
    CENT
    0.07
    nout
    0.06
    installer
    0.06
     отношения
    0.06
    CoreApplication
    0.06
    attro
    0.06
    0.06
     kullanıcı
    0.06
    recommend
    0.06
    Ont
    0.06
    Act Density 0.013%

    No Known Activations