INDEX
Explanations
This neuron activates on the instruction prompt requiring the answer to begin explicitly with “Yes” or “No.” It detects the directive about how to format the response (starting with "Yes" or "No").
New Auto-Interp
Negative Logits
Tah
-0.06
viso
-0.06
\Test
-0.06
++;↵
-0.06
این
-0.06
enou
-0.06
printk
-0.06
Boehner
-0.05
(and
-0.05
자의
-0.05
POSITIVE LOGITS
uring
0.07
converged
0.07
مورد
0.07
unidentified
0.07
lending
0.07
-local
0.07
slide
0.07
squirrel
0.07
RELATED
0.07
-party
0.06
Activations Density 0.008%