INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    нее
    -0.07
    оды
    -0.07
    aze
    -0.07
    dl
    -0.06
     viewed
    -0.06
    ased
    -0.06
    	ps
    -0.06
    svm
    -0.06
     acquainted
    -0.06
    ген
    -0.06
    POSITIVE LOGITS
    0.07
     českých
    0.06
     Peygamber
    0.06
     proximity
    0.06
    optgroup
    0.06
    finally
    0.06
     open
    0.06
     listop
    0.06
    ्वच
    0.06
    0.06
    Act Density 0.001%

    No Known Activations