INDEX
Explanations
code/web pages
This neuron strongly activates on tokens from the system-instruction section—especially first-person (“I,” “will”), modal verbs, and surrounding punctuation—marking the model’s persona/setup prompt.
New Auto-Interp
Negative Logits
جام
-0.06
languages
-0.06
線
-0.06
Since
-0.06
Legacy
-0.06
.Contracts
-0.06
licted
-0.06
Recommend
-0.06
\Service
-0.06
.Attribute
-0.06
POSITIVE LOGITS
Shut
0.07
enticator
0.07
cons
0.06
виб
0.06
#$
0.06
τής
0.06
IRON
0.06
söz
0.06
ABL
0.06
financially
0.06
Activations Density 0.002%