INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    enschappelijke
    -0.09
     simplistic
    -0.08
     einfache
    -0.08
    staking
    -0.08
    dele
    -0.07
     פשוט
    -0.07
     Verb
    -0.07
    简单
    -0.07
    loze
    -0.07
    verbose
    -0.07
    POSITIVE LOGITS
    Caracter
    0.08
    liye
    0.08
     cmp
    0.08
    Guarante
    0.08
     esclus
    0.07
     feront
    0.07
     राय
    0.07
     खिलाड़ियों
    0.07
    .parameters
    0.07
    ೈನ
    0.07
    Act Density 0.014%

    No Known Activations