INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    SIZ
    -0.61
    SIN
    -0.59
     Augusta
    -0.58
    žin
    -0.58
    ubehör
    -0.57
    vänt
    -0.56
    Sims
    -0.56
     sjuk
    -0.56
     Sloan
    -0.56
    axel
    -0.56
    POSITIVE LOGITS
     are
    1.24
     yourselves
    1.15
     ARE
    1.10
     were
    1.09
     themselves
    1.06
    wolves
    1.01
     WERE
    1.01
    were
    0.99
    themselves
    0.98
     Were
    0.98
    Act Density 0.523%

    No Known Activations