INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (target
    -0.07
     sno
    -0.06
     moot
    -0.06
     snd
    -0.06
     msm
    -0.06
    (st
    -0.06
    (Debug
    -0.06
    -0.06
    (after
    -0.06
    (sym
    -0.06
    POSITIVE LOGITS
    lop
    0.07
     های
    0.07
    sur
    0.07
     agricultural
    0.07
    suz
    0.07
    UBLISH
    0.06
    pun
    0.06
    top
    0.06
    uv
    0.06
    sup
    0.06
    Act Density 0.002%

    No Known Activations