INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Init
    -0.07
     Winners
    -0.07
    -0.07
     Δ
    -0.07
     Mom
    -0.07
    Δ
    -0.06
    _Manager
    -0.06
     KB
    -0.06
     Dumpster
    -0.06
     homes
    -0.06
    POSITIVE LOGITS
     cal
    0.09
    吸纳
    0.07
    造血
    0.07
     rekl
    0.07
     Electric
    0.07
    onitor
    0.07
    rese
    0.06
    emat
    0.06
     electrical
    0.06
    _alg
    0.06
    Act Density 0.008%

    No Known Activations