INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    704
    -0.08
     және
    -0.07
     жыл
    -0.07
     முத
    -0.07
     жаң
    -0.07
    收益
    -0.07
    714
    -0.07
    itz
    -0.07
    周期
    -0.07
     ප්
    -0.07
    POSITIVE LOGITS
     startled
    0.09
     себя
    0.09
     siebie
    0.09
    Everyone
    0.08
    নিক
    0.08
     gern
    0.08
    Equipe
    0.08
    agnie
    0.08
     surprised
    0.08
     ogl
    0.08
    Act Density 0.007%

    No Known Activations