INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     serr
    -0.08
     laf
    -0.08
     motivating
    -0.08
     ferv
    -0.08
     Wein
    -0.08
     alc
    -0.07
     depl
    -0.07
     press
    -0.07
     pét
    -0.07
    -0.07
    POSITIVE LOGITS
     Keb
    0.08
     Emb
    0.07
     circ
    0.07
    seq
    0.07
     userdata
    0.07
    Pep
    0.07
    .gz
    0.07
    speaker
    0.07
    bial
    0.07
     Bish
    0.07
    Act Density 0.002%

    No Known Activations