INDEX
Explanations
the neuron detects question/request turns — it fires on tokens that appear in user queries asking for factual information.
The neuron detects user query turns—that is, lines where the user asks a question.
New Auto-Interp
Negative Logits
instinctively
0.79
magari
0.75
ఆలో
0.75
Hãy
0.75
Пусть
0.73
lepiej
0.71
imagina
0.71
autoestima
0.69
misschien
0.68
アイデア
0.68
POSITIVE LOGITS
reportedly
1.02
officially
0.97
official
0.94
erdapat
0.93
Additionally
0.92
According
0.90
તેઓ
0.90
Official
0.90
there
0.88
details
0.88
Activations Density 0.010%