INDEX
    Explanations

    The neuron flags tokens in the tool‐specification section of the prompt—i.e. names of the provided Python tools and the keywords describing their inputs, outputs, and behavior.

    New Auto-Interp
    Negative Logits
    _Internal
    -0.07
    озмож
    -0.07
    range
    -0.06
    _amt
    -0.06
    сия
    -0.06
     surrogate
    -0.06
    olland
    -0.06
    ,user
    -0.06
     defenders
    -0.06
    yal
    -0.06
    POSITIVE LOGITS
     Doch
    0.07
     offen
    0.06
    0.06
     Sadly
    0.06
    Sad
    0.06
     gemacht
    0.06
    0.06
     predictions
    0.06
    兄弟
    0.06
    Translation
    0.06
    Act Density 0.011%

    No Known Activations