INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     clips
    -0.09
     percepción
    -0.08
    oraa
    -0.08
     Eclipse
    -0.08
    -0.07
    insu
    -0.07
     clip
    -0.07
    kort
    -0.07
     Eddy
    -0.07
     infiltration
    -0.07
    POSITIVE LOGITS
    0.08
    (state
    0.08
    )/(
    0.08
    Change
    0.07
     уход
    0.07
    احب
    0.07
    верх
    0.07
    Another
    0.07
     арас
    0.07
     аж
    0.07
    Act Density 0.008%

    No Known Activations