INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    azo
    -0.07
    -0.07
    ежду
    -0.07
     electrode
    -0.06
    лара
    -0.06
     проти
    -0.06
     standoff
    -0.06
     noses
    -0.06
    aze
    -0.06
    lique
    -0.06
    POSITIVE LOGITS
     Fairfax
    0.07
    .partial
    0.07
    (Core
    0.06
     Bought
    0.06
    _eps
    0.06
     Spam
    0.06
    (isinstance
    0.06
    :@"%@
    0.06
    Fashion
    0.06
     patched
    0.06
    Act Density 0.016%

    No Known Activations