INDEX
Explanations
first-person pronouns
descriptions of capabilities and functionalities of an AI language model.
This neuron activates on tokens that are part of the assistant’s self-descriptions or capability listings—especially the “I can…” statements and their accompanying list markers.
New Auto-Interp
Negative Logits
stroke
-0.07
pec
-0.06
_matching
-0.06
增加
-0.06
purchase
-0.06
MethodInvocation
-0.06
-0.06
-width
-0.06
tăng
-0.06
mix
-0.06
POSITIVE LOGITS
ScrollView
0.07
])).
0.07
OPTIONS
0.07
groupBy
0.06
].
0.06
�
0.06
>)
0.06
ze
0.06
—he
0.06
np
0.06
Activations Density 0.040%