INDEX
Explanations
instructions
This neuron responds to the header labels “Instruction” (and its adjoining words like “and” and “Question”), i.e. it detects the formatted prompt‐instruction section.
New Auto-Interp
Negative Logits
тап
-0.07
Од
-0.06
KY
-0.06
้าย
-0.06
\
-0.06
descended
-0.06
]="
-0.06
(rec
-0.06
何
-0.06
PyErr
-0.06
POSITIVE LOGITS
Canary
0.07
expense
0.06
FORMATION
0.06
leo
0.06
anker
0.06
aira
0.06
.runners
0.06
grands
0.06
CDN
0.06
olumsuz
0.06
Activations Density 0.013%