INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    NX
    -0.09
     Sty
    -0.07
    epar
    -0.07
    (heap
    -0.06
    oters
    -0.06
    Located
    -0.06
    _PB
    -0.06
    qh
    -0.06
    sth
    -0.06
    Checking
    -0.06
    POSITIVE LOGITS
    /--
    0.07
     kron
    0.07
     imperson
    0.06
    人的
    0.06
     domin
    0.06
    0.06
    atıcı
    0.06
     Gerald
    0.06
     sep
    0.06
     bold
    0.06
    Act Density 0.035%

    No Known Activations