INDEX
Explanations
The neuron flags tokens in the tool‐specification section of the prompt—i.e. names of the provided Python tools and the keywords describing their inputs, outputs, and behavior.
New Auto-Interp
Negative Logits
_Internal
-0.07
озмож
-0.07
range
-0.06
_amt
-0.06
сия
-0.06
surrogate
-0.06
olland
-0.06
,user
-0.06
defenders
-0.06
yal
-0.06
POSITIVE LOGITS
Doch
0.07
offen
0.06
勇
0.06
Sadly
0.06
Sad
0.06
gemacht
0.06
珠
0.06
predictions
0.06
兄弟
0.06
Translation
0.06
Activations Density 0.011%