INDEX
    Explanations

    This neuron fires on direct imperative user instructions addressed to the assistant (e.g. “tell me,” “inform me,” “give me,” etc.), i.e. attempts to drive the assistant to break its normal rules.

    New Auto-Interp
    Negative Logits
     medication
    -0.07
    	Assert
    -0.07
    -0.06
    -0.06
    人才
    -0.06
     prediction
    -0.06
    そんな
    -0.06
     Kam
    -0.06
    かった
    -0.06
    มหานคร
    -0.06
    POSITIVE LOGITS
    Skin
    0.08
    .Unlock
    0.07
     aute
    0.07
    ":"/
    0.06
     ense
    0.06
     gi
    0.06
     setSearch
    0.06
     saldo
    0.06
     FY
    0.06
    	old
    0.06
    Act Density 0.010%

    No Known Activations