INDEX
Explanations
code/data structure
This neuron fires on the structured “Question:” prompt header—i.e. the tokens labeling the user’s question (like “Question:”, “the”, “input”, “question”, “you”) in the few-shot prompt format.
New Auto-Interp
Negative Logits
털
-0.06
Density
-0.06
AMP
-0.06
debian
-0.06
فو
-0.06
.clf
-0.06
Dead
-0.06
�
-0.06
freshwater
-0.06
_bt
-0.06
POSITIVE LOGITS
crets
0.07
protested
0.06
губер
0.06
rubble
0.06
eful
0.06
sess
0.06
ئت
0.06
’ll
0.06
ULA
0.06
'll
0.06
Activations Density 0.002%