INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     yayım
    -0.07
    	Type
    -0.07
     washing
    -0.07
     Všech
    -0.07
    ологіч
    -0.06
    	em
    -0.06
     dove
    -0.06
     steals
    -0.06
    оло
    -0.06
    +.
    -0.06
    POSITIVE LOGITS
    !!!
    0.07
     naz
    0.06
    0.06
    atever
    0.06
    0.06
    (dot
    0.06
    Lisa
    0.06
    0.06
     Trab
    0.06
    0.06
    Act Density 0.022%

    No Known Activations