INDEX
Explanations
This neuron selectively activates on the auxiliary verb “does.”
New Auto-Interp
Negative Logits
/meta
-0.06
ownt
-0.06
IAN
-0.06
Seah
-0.06
Barbar
-0.06
Every
-0.06
marathon
-0.06
lun
-0.06
Marathon
-0.06
REAM
-0.06
POSITIVE LOGITS
does
0.13
did
0.10
doesn
0.10
did
0.10
does
0.09
don
0.09
is
0.09
do
0.09
doesn
0.09
—is
0.09
Activations Density 0.051%