INDEX
Explanations
The neuron primarily activates on the verb “pull” (especially as part of the phrase “pull off”).
New Auto-Interp
Negative Logits
723
-0.07
ize
-0.07
_desc
-0.07
[x
-0.07
resents
-0.07
encode
-0.07
819
-0.06
ouis
-0.06
-0.06
izes
-0.06
POSITIVE LOGITS
pull
0.14
pulled
0.13
pulls
0.11
Pull
0.11
Pull
0.11
pulling
0.10
.Pull
0.10
�
0.08
tug
0.08
LL
0.08
Activations Density 0.012%