INDEX
Explanations
command line, regex
The neuron activates on words like “This,” “Here,” and similar demonstratives used to introduce explanatory commentary, i.e. it detects code‐comment or explanation phrases.
New Auto-Interp
Negative Logits
Ib
-0.07
land
-0.07
_setup
-0.07
スペ
-0.06
Collect
-0.06
babys
-0.06
olsun
-0.06
Registr
-0.06
Toy
-0.06
car
-0.06
POSITIVE LOGITS
°
0.06
âk
0.06
];↵↵
0.06
Claim
0.06
optgroup
0.06
↵
0.06
鎮
0.06
приз
0.06
Admin
0.06
든
0.06
Activations Density 0.060%