INDEX
Explanations
This neuron detects mentions of “ear” (ear-related tokens).
New Auto-Interp
Negative Logits
img
-0.08
diff
-0.07
建设
-0.07
Boston
-0.07
70
-0.07
.Blue
-0.07
obligated
-0.07
Cox
-0.07
“No
-0.07
cmd
-0.07
POSITIVE LOGITS
ears
0.13
ear
0.12
Ear
0.12
earrings
0.09
Ear
0.08
ra
0.07
earing
0.07
ar
0.07
AR
0.07
rees
0.07
Activations Density 0.007%