INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     on
    -1.19
     With
    -1.01
     artificiales
    -0.96
     in
    -0.96
     July
    -0.96
    -0.91
    !
    -0.90
     Erziehung
    -0.90
     it
    -0.88
     went
    -0.88
    POSITIVE LOGITS
     products
    1.13
    Kde
    1.10
     erhi
    1.04
    comme
    1.03
     duong
    1.02
     crescente
    1.02
     effetto
    1.00
     tivi
    0.99
     distinguer
    0.99
    0.99
    Act Density 0.053%

    No Known Activations