INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    atre
    -0.15
    otte
    -0.15
    egen
    -0.15
     Musk
    -0.14
    etter
    -0.14
    aska
    -0.14
    alsa
    -0.14
     när
    -0.13
     AES
    -0.13
    elm
    -0.13
    POSITIVE LOGITS
    akan
    0.16
    ä¿
    0.16
    ded
    0.15
    /her
    0.15
    íĮIJ
    0.14
    panic
    0.14
    же
    0.14
    ainers
    0.14
     diá»ħn
    0.14
    ÏĦÏħ
    0.14
    Act Density 0.574%

    No Known Activations