INDEX
    Explanations

    questions and answers

    This neuron detects tokens associated with yes/no questions and answers (e.g., the words “yes,” “no,” and related instruction cues).

    New Auto-Interp
    Negative Logits
    /ms
    -0.06
    /config
    -0.06
     swim
    -0.06
     Russo
    -0.06
    .authorization
    -0.06
    .exam
    -0.06
     xi
    -0.06
    Г
    -0.06
     eval
    -0.06
     spouse
    -0.06
    POSITIVE LOGITS
    0.07
     جدا
    0.07
    σκ
    0.07
     komen
    0.07
    _lazy
    0.06
     landlords
    0.06
    การจ
    0.06
    พน
    0.06
    *:
    0.06
    ($_
    0.06
    Act Density 0.073%

    No Known Activations