INDEX
Explanations
This neuron detects the special instruction‐format labels (e.g. “Action,” “Input,” “Thought,” “Observation,” “Final Answer”) in the dialogue.
New Auto-Interp
Negative Logits
mxArray
-0.06
witty
-0.06
}),↵↵
-0.06
_reward
-0.06
۱۰
-0.06
ربه
-0.06
that
-0.06
etroit
-0.06
"{}-0.06
grese
-0.06
POSITIVE LOGITS
Lista
0.07
Plate
0.07
Pascal
0.06
инт
0.06
498
0.06
Real
0.06
material
0.06
collections
0.06
_span
0.06
Hannity
0.06
Activations Density 0.004%