INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     "
    -1.13
    Nhưng
    -1.10
     termurah
    -1.05
     Trabalho
    -1.03
     touche
    -1.03
     Técnicas
    -1.01
    军队
    -1.01
     Viena
    -1.00
     Portugu
    -0.99
    labas
    -0.98
    POSITIVE LOGITS
     måneder
    1.18
    igung
    1.08
     anunció
    1.05
     årene
    1.03
     voks
    1.03
     onsdag
    1.02
     børn
    1.00
     virke
    1.00
    리아
    0.99
     skues
    0.98
    Act Density 0.330%

    No Known Activations