INDEX
    Explanations

    This neuron detects mentions of the agent’s tool names (e.g., “Search” and “Calculator”) in the instruction or action lines.

    New Auto-Interp
    Negative Logits
     abre
    -0.08
     suç
    -0.07
     Fior
    -0.07
     페이지
    -0.07
    mt
    -0.06
    /buttons
    -0.06
    ornings
    -0.06
    /work
    -0.06
     Barton
    -0.06
     thieves
    -0.06
    POSITIVE LOGITS
    δι
    0.07
    Offset
    0.06
    iệp
    0.06
     submitting
    0.06
    ...↵
    0.06
    elsinki
    0.06
    urlencode
    0.06
     Gardens
    0.06
    っぱい
    0.06
     sells
    0.06
    Act Density 0.005%

    No Known Activations