INDEX
Explanations
questions/discussions
This neuron activates on tokens within detailed, step-by-step explanatory or elimination reasoning sections of the assistant’s answers.
New Auto-Interp
Negative Logits
isc
-0.07
rop
-0.07
exam
-0.07
ワ
-0.06
isse
-0.06
.gwt
-0.06
Pandora
-0.06
addObserver
-0.06
airro
-0.06
losses
-0.06
POSITIVE LOGITS
-interface
0.08
>You
0.06
การท
0.06
?>/
0.06
+m
0.06
۱۵
0.06
~
0.06
// ↵
0.06
//--------------------------------------------------------------------------------
0.06
จะต
0.06
Activations Density 0.030%