INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     disadv
    -0.09
     exile
    -0.08
     vague
    -0.08
     crem
    -0.08
     presented
    -0.08
     кров
    -0.07
     transplantation
    -0.07
     grazing
    -0.07
     recicl
    -0.07
     cav
    -0.07
    POSITIVE LOGITS
     finalists
    0.08
     Unite
    0.08
     docs
    0.08
    0.08
     ailes
    0.08
     Values
    0.07
     Strings
    0.07
     |_|
    0.07
     Jun
    0.07
     CHANGE
    0.07
    Act Density 0.002%

    No Known Activations