INDEX
    Explanations

    the neuron detects question/request turns — it fires on tokens that appear in user queries asking for factual information.

    The neuron detects user query turns—that is, lines where the user asks a question.

    New Auto-Interp
    Negative Logits
     instinctively
    0.79
     magari
    0.75
     ఆలో
    0.75
    Hãy
    0.75
     Пусть
    0.73
     lepiej
    0.71
     imagina
    0.71
     autoestima
    0.69
     misschien
    0.68
    アイデア
    0.68
    POSITIVE LOGITS
     reportedly
    1.02
     officially
    0.97
     official
    0.94
    erdapat
    0.93
     Additionally
    0.92
     According
    0.90
     તેઓ
    0.90
     Official
    0.90
     there
    0.88
     details
    0.88
    Act Density 0.010%

    No Known Activations