INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Preis
    -0.08
     substantial
    -0.08
     kuit
    -0.08
     endogenous
    -0.08
     hind
    -0.07
     kao
    -0.07
     industry's
    -0.07
    -0.07
    wem
    -0.07
     역시
    -0.07
    POSITIVE LOGITS
    unordered
    0.08
     favorito
    0.07
     último
    0.07
    0.07
     awesome
    0.07
     nginx
    0.07
     সংগ
    0.07
    itial
    0.07
    _CAN
    0.07
     fwrite
    0.07
    Act Density 0.021%

    No Known Activations