INDEX
    Explanations

    Conversational responses

    The neuron fires on the assistant’s polite greeting and offer-to-help phrases (e.g. “Hello! I’d be happy to help you with…”).

    New Auto-Interp
    Negative Logits
     времени
    -0.07
    ает
    -0.07
    .eng
    -0.06
    belief
    -0.06
    .Tree
    -0.06
    -0.06
    obe
    -0.06
    _ATOMIC
    -0.06
    mach
    -0.06
     интер
    -0.06
    POSITIVE LOGITS
    oen
    0.08
     cursed
    0.07
     heel
    0.07
     cite
    0.07
    elow
    0.06
     jon
    0.06
     terrible
    0.06
    =['
    0.06
    [data
    0.06
     sustaining
    0.06
    Act Density 0.064%

    No Known Activations