INDEX
    Explanations

    This neuron detects question-introducing words and phrases (e.g. “how,” “can,” “we,” etc.) that kick off interrogative sentences.

    New Auto-Interp
    Negative Logits
    trust
    -0.06
    Phi
    -0.06
     cottage
    -0.06
    impact
    -0.06
     prevent
    -0.06
    -0.06
    script
    -0.06
     функци
    -0.06
    TYPE
    -0.06
    icont
    -0.06
    POSITIVE LOGITS
    老师
    0.07
     чоловік
    0.07
    -ranking
    0.07
    ские
    0.07
    imo
    0.06
    енность
    0.06
    oralType
    0.06
     tuner
    0.06
    0.06
    (se
    0.06
    Act Density 0.052%

    No Known Activations