INDEX
    Explanations

    This neuron detects the special instruction‐format labels (e.g. “Action,” “Input,” “Thought,” “Observation,” “Final Answer”) in the dialogue.

    New Auto-Interp
    Negative Logits
     mxArray
    -0.06
     witty
    -0.06
     }),↵↵
    -0.06
    _reward
    -0.06
    ۱۰
    -0.06
    ربه
    -0.06
    	that
    -0.06
    etroit
    -0.06
     "{}
    -0.06
    grese
    -0.06
    POSITIVE LOGITS
     Lista
    0.07
     Plate
    0.07
     Pascal
    0.06
     инт
    0.06
    498
    0.06
    Real
    0.06
    material
    0.06
     collections
    0.06
    _span
    0.06
     Hannity
    0.06
    Act Density 0.004%

    No Known Activations