INDEX
    Explanations

    The neuron detects the instruction words “question” and “answers” in the prompt asking whether a sentence answers a given question.

    New Auto-Interp
    Negative Logits
     oste
    -0.06
    ková
    -0.06
     Robot
    -0.06
     müdür
    -0.06
    Notice
    -0.06
     Cel
    -0.06
     감독
    -0.06
    원을
    -0.06
     Sour
    -0.06
     Notice
    -0.06
    POSITIVE LOGITS
    udo
    0.07
     eternity
    0.06
     background
    0.06
    เพลง
    0.06
     convoy
    0.06
     systemFontOfSize
    0.06
    patial
    0.06
    acje
    0.06
    一起
    0.06
    _ajax
    0.05
    Act Density 0.001%

    No Known Activations