INDEX
    Explanations

    mathematical notation

    New Auto-Interp
    Negative Logits
    jam
    -0.08
    make
    -0.08
     schu
    -0.08
    juk
    -0.08
     hamper
    -0.07
     Blogger
    -0.07
    ədə
    -0.07
     blogg
    -0.07
    bak
    -0.07
    ...",↵
    -0.07
    POSITIVE LOGITS
     phải
    0.09
     Somehow
    0.08
     trebuie
    0.08
     harus
    0.08
     któr
    0.08
     mindestens
    0.08
     Rela
    0.08
    .Config
    0.08
     underlying
    0.07
    			
    0.07
    Act Density 0.172%

    No Known Activations