INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
    alt
    -0.07
     declaración
    -0.07
    <Component
    -0.06
    átku
    -0.06
     Ihre
    -0.06
    akt
    -0.06
     zdarma
    -0.06
    -0.06
     Чем
    -0.06
    POSITIVE LOGITS
     guess
    0.14
     guessing
    0.13
     Guess
    0.11
     guessed
    0.10
     guesses
    0.10
    Guess
    0.10
    guess
    0.08
     Straw
    0.07
     vaccination
    0.06
     Grace
    0.06
    Act Density 0.007%

    No Known Activations