INDEX
Explanations
information and questions
The neuron fires on the assistant’s self-descriptive meta-language—phrases where it explains its role, capabilities, or guidelines (e.g. “my primary function is to provide…,” “as an AI language model,” etc.).
New Auto-Interp
Negative Logits
jd
-0.08
/code
-0.07
igmat
-0.06
ليم
-0.06
ajes
-0.06
_CH
-0.06
opposes
-0.06
btc
-0.06
організ
-0.06
odel
-0.06
POSITIVE LOGITS
dễ
0.06
으나
0.06
waste
0.06
финансов
0.06
Swap
0.06
_TXT
0.06
.snapshot
0.06
.drawString
0.06
PAY
0.06
relevant
0.06
Activations Density 0.077%