INDEX
Explanations
The neuron selectively activates on interrogative question words (e.g. “What”) at the start of a user’s question.
New Auto-Interp
Negative Logits
الش
-0.06
PLATFORM
-0.06
-no
-0.06
los
-0.06
�
-0.06
-system
-0.06
heads
-0.06
Thumb
-0.06
polo
-0.06
문화
-0.06
POSITIVE LOGITS
سكان
0.07
"",↵
0.07
injuring
0.06
Taken
0.06
čka
0.06
(sec
0.06
_div
0.06
'/>↵
0.06
(visitor
0.06
smrt
0.06
Activations Density 0.046%