INDEX
    Explanations

    function parameters

    The neuron strongly activates on natural‐language explanations of function signatures—phrases like “takes X arguments: the first…, and the second….”

    New Auto-Interp
    Negative Logits
     paralle
    -0.07
    -0.07
    -road
    -0.06
     bidder
    -0.06
    -0.06
    ικού
    -0.06
    _school
    -0.06
     Session
    -0.06
    Inputs
    -0.06
    =logging
    -0.06
    POSITIVE LOGITS
    SID
    0.06
    sexy
    0.06
     Jeff
    0.06
     Automobile
    0.06
    ؟↵
    0.06
    Fant
    0.06
    078
    0.06
    öt
    0.06
     Got
    0.06
    usalem
    0.06
    Act Density 0.024%

    No Known Activations