INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Blocking
    -0.09
     blocking
    -0.08
    yan
    -0.08
     ир
    -0.08
     shiny
    -0.08
    Blocked
    -0.07
    blocking
    -0.07
     blocked
    -0.07
    oa
    -0.07
    Accessible
    -0.07
    POSITIVE LOGITS
     positivo
    0.11
    /net
    0.10
     net
    0.10
    -positive
    0.10
    /-
    0.10
     imbalance
    0.09
    /+
    0.09
    	net
    0.09
     gained
    0.09
    -negative
    0.09
    Act Density 0.028%

    No Known Activations