INDEX
Explanations
This neuron fires on the gerund “responding” (as in “responding to”) appearing in instruction phrases.
New Auto-Interp
Negative Logits
Cast
-0.06
�
-0.06
incy
-0.06
111
-0.06
jobs
-0.06
Ung
-0.06
baked
-0.06
Segoe
-0.06
६
-0.06
beds
-0.06
POSITIVE LOGITS
nullptr
0.07
Canadian
0.07
freshness
0.07
empower
0.07
](↵
0.07
писание
0.07
impacting
0.07
oubted
0.06
strlen
0.06
noisy
0.06
Activations Density 0.018%