INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     levert
    -0.08
    리를
    -0.08
     bulun
    -0.07
    чна
    -0.07
     hurt
    -0.07
    Hur
    -0.07
    -0.07
    -0.07
    Nie
    -0.07
    armaceutical
    -0.07
    POSITIVE LOGITS
     Spots
    0.09
     spots
    0.09
     cells
    0.08
     Tet
    0.08
     wszystkim
    0.08
     sky
    0.08
    -painted
    0.08
     остров
    0.07
    adecimal
    0.07
     విమ
    0.07
    Act Density 0.012%

    No Known Activations