INDEX
Explanations
The neuron is triggered by occurrences of the word “answer” (in any capitalization).
New Auto-Interp
Negative Logits
relocation
-0.08
deployment
-0.08
registro
-0.07
496
-0.07
yectos
-0.07
perience
-0.07
com
-0.07
glm
-0.07
CLLocation
-0.07
_location
-0.07
POSITIVE LOGITS
answer
0.13
answers
0.12
Answer
0.10
answered
0.10
answer
0.09
ans
0.09
(answer
0.08
as
0.08
ans
0.08
AB
0.08
Activations Density 0.079%