INDEX
Explanations
The neuron activates on mentions of mining actions and tool‐related terms (like “mine” and “tool”) in the code.
New Auto-Interp
Negative Logits
prevents
-0.07
daughters
-0.06
computer
-0.06
ox
-0.06
SECTION
-0.06
raid
-0.06
sniper
-0.06
ment
-0.06
-work
-0.06
"What
-0.06
POSITIVE LOGITS
poru
0.07
DISTRIBUT
0.07
लड
0.07
концентра
0.07
ánchez
0.07
expanded
0.06
arlar
0.06
lein
0.06
Literal
0.06
dashed
0.06
Activations Density 0.487%