INDEX
    Explanations

    first-person pronouns

    descriptions of capabilities and functionalities of an AI language model.

    This neuron activates on tokens that are part of the assistant’s self-descriptions or capability listings—especially the “I can…” statements and their accompanying list markers.

    New Auto-Interp
    Negative Logits
    stroke
    -0.07
    pec
    -0.06
    _matching
    -0.06
    增加
    -0.06
    purchase
    -0.06
     MethodInvocation
    -0.06
        
    -0.06
    -width
    -0.06
     tăng
    -0.06
    mix
    -0.06
    POSITIVE LOGITS
     ScrollView
    0.07
    ])).
    0.07
     OPTIONS
    0.07
    groupBy
    0.06
     ].
    0.06
    0.06
    >)
    0.06
     ze
    0.06
    —he
    0.06
     np
    0.06
    Act Density 0.040%

    No Known Activations