INDEX
    Explanations

    percentage differences

    New Auto-Interp
    Negative Logits
     
    -0.08
    ến
    -0.08
     the
    -0.08
     of
    -0.08
    ೂರ
    -0.07
    ום
    -0.07
     MER
    -0.07
     in
    -0.07
    .js
    -0.07
    .hr
    -0.07
    POSITIVE LOGITS
     Gucci
    0.09
     CGContext
    0.09
    qat
    0.09
     وعلى
    0.08
     labi
    0.08
     Gujar
    0.08
    naan
    0.08
    tog
    0.08
     witte
    0.08
     Botox
    0.08
    Act Density 0.050%

    No Known Activations