INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     السود
    -0.09
    UTF
    -0.08
     числе
    -0.08
     afgest
    -0.08
    ocratic
    -0.07
     Schwarz
    -0.07
    toe
    -0.07
     некоторых
    -0.07
    सर
    -0.07
    isecond
    -0.07
    POSITIVE LOGITS
    enshi
    0.08
    ']['
    0.08
    פחה
    0.08
     বিজ্ঞান
    0.08
     Diva
    0.07
    {'
    0.07
     produt
    0.07
     Vineyard
    0.07
    phe
    0.07
     especiais
    0.07
    Act Density 0.042%

    No Known Activations