INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     بنابراین
    -0.09
    цуз
    -0.08
    -0.07
     زن
    -0.06
     Honour
    -0.06
     corros
    -0.06
    átis
    -0.06
     '<%=
    -0.06
    estyle
    -0.06
     sürede
    -0.06
    POSITIVE LOGITS
     menu
    0.08
     Menu
    0.08
    Menu
    0.08
     universal
    0.07
    428
    0.06
     sentinel
    0.06
    menus
    0.06
    				      
    0.06
    christ
    0.06
     Đầu
    0.06
    Act Density 0.004%

    No Known Activations