INDEX
    Explanations

    This neuron detects instruction phrases about completing the user’s request (e.g., “completes the request”).

    New Auto-Interp
    Negative Logits
    landa
    -0.06
    stances
    -0.06
    macros
    -0.06
     pistols
    -0.06
    -0.06
     dbus
    -0.06
    经验
    -0.06
    -0.06
    ýval
    -0.06
    (that
    -0.06
    POSITIVE LOGITS
    ODULE
    0.07
     vulner
    0.06
    ξει
    0.06
     pw
    0.06
    owntown
    0.06
     konum
    0.06
    وروب
    0.06
     downside
    0.06
    ünd
    0.06
    .ViewGroup
    0.06
    Act Density 0.002%

    No Known Activations