INDEX
Explanations
This neuron detects mentions of the agent’s tool names (e.g., “Search” and “Calculator”) in the instruction or action lines.
New Auto-Interp
Negative Logits
abre
-0.08
suç
-0.07
Fior
-0.07
페이지
-0.07
mt
-0.06
/buttons
-0.06
ornings
-0.06
/work
-0.06
Barton
-0.06
thieves
-0.06
POSITIVE LOGITS
δι
0.07
Offset
0.06
iệp
0.06
submitting
0.06
...↵
0.06
elsinki
0.06
urlencode
0.06
Gardens
0.06
っぱい
0.06
sells
0.06
Activations Density 0.005%