INDEX
Explanations
function parameters
The neuron strongly activates on natural‐language explanations of function signatures—phrases like “takes X arguments: the first…, and the second….”
New Auto-Interp
Negative Logits
paralle
-0.07
掛
-0.07
-road
-0.06
bidder
-0.06
혜
-0.06
ικού
-0.06
_school
-0.06
Session
-0.06
Inputs
-0.06
=logging
-0.06
POSITIVE LOGITS
SID
0.06
sexy
0.06
Jeff
0.06
Automobile
0.06
؟↵
0.06
Fant
0.06
078
0.06
öt
0.06
Got
0.06
usalem
0.06
Activations Density 0.024%