INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pParent
    -0.08
     forward
    -0.07
     #__
    -0.07
     forwarded
    -0.07
     motivation
    -0.07
     choosing
    -0.06
     Agent
    -0.06
    (reinterpret
    -0.06
    ossed
    -0.06
    MainThread
    -0.06
    POSITIVE LOGITS
    还是
    0.07
    하면서
    0.07
    :Is
    0.07
     ici
    0.07
    jis
    0.07
    Bes
    0.07
    0.07
    0.07
    —is
    0.07
     is
    0.07
    Act Density 0.024%

    No Known Activations