INDEX
    Explanations

    The neuron activates on imperative action words (e.g. “use,” “add”) in instruction-style prompts.

    New Auto-Interp
    Negative Logits
     baskets
    -0.07
    Jones
    -0.07
    -fold
    -0.07
    fold
    -0.07
    _pf
    -0.07
    beck
    -0.07
    _REPORT
    -0.06
    Deep
    -0.06
    ewolf
    -0.06
    #create
    -0.06
    POSITIVE LOGITS
     تجه
    0.07
    ("!
    0.07
     fecha
    0.06
    _EC
    0.06
     perso
    0.06
    0.06
     alleen
    0.06
     Türkçe
    0.06
     subsidiaries
    0.06
    اصيل
    0.06
    Act Density 0.021%

    No Known Activations