INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .sum
    -0.07
    -sensitive
    -0.07
     таким
    -0.06
    -0.06
    INA
    -0.06
    taboola
    -0.06
     altura
    -0.06
    agina
    -0.06
    fir
    -0.06
     anticipate
    -0.06
    POSITIVE LOGITS
     overd
    0.06
     Çocuk
    0.06
     OPERATION
    0.06
    0.06
    0.06
     označ
    0.06
    0.06
    	emit
    0.06
    icolor
    0.06
     underwater
    0.06
    Act Density 0.020%

    No Known Activations