INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     both
    -1.66
     that
    -1.65
     publicó
    -1.36
     lanzó
    -1.30
     angefangen
    -1.28
     many
    -1.26
     everyone
    -1.21
     a
    -1.20
     zarówno
    -1.20
     on
    -1.19
    POSITIVE LOGITS
    isations
    1.29
     ultimi
    1.28
    rebbe
    1.25
     spion
    1.20
     stessi
    1.20
    тисти
    1.16
    rebbero
    1.16
     menyen
    1.16
    勝手に
    1.15
     laga
    1.14
    Act Density 0.052%

    No Known Activations