INDEX
    Explanations

    Offering help

    The neuron detects the assistant’s self-referential help-offer phrasing (e.g. “I’ll do my best to help”).

    New Auto-Interp
    Negative Logits
    bz
    -0.08
    /files
    -0.08
     경제
    -0.06
    ोर
    -0.06
     gar
    -0.06
     сф
    -0.06
    ασ
    -0.06
    function
    -0.06
    .and
    -0.06
    astr
    -0.06
    POSITIVE LOGITS
    기도
    0.07
     Flavor
    0.07
     bruk
    0.06
    ания
    0.06
    _leg
    0.06
    ерк
    0.06
     vým
    0.06
    <Player
    0.06
    TexParameter
    0.06
    FromFile
    0.06
    Act Density 0.033%

    No Known Activations