INDEX
Explanations
This neuron activates on occurrences of the interrogative “who” (i.e. questions asking about a person).
New Auto-Interp
Negative Logits
Daniels
-0.08
overflow
-0.07
fprintf
-0.06
生
-0.06
افع
-0.06
flag
-0.06
_books
-0.06
sterreich
-0.06
Det
-0.06
Dia
-0.06
POSITIVE LOGITS
corre
0.07
:pk
0.07
ues
0.06
/User
0.06
camera
0.06
.tasks
0.06
zilla
0.06
格
0.06
accuses
0.06
initialize
0.06
Activations Density 0.017%