INDEX
Explanations
The neuron flags the special “chain‐of‐thought” or reasoning control tokens (e.g. “Thought,” “Action,” “Observation”) in the model’s internal transcript.
New Auto-Interp
Negative Logits
ảnh
-0.06
vamp
-0.06
승
-0.06
σια
-0.06
Seas
-0.06
trip
-0.06
.micro
-0.06
ذه
-0.06
ấn
-0.06
leitung
-0.06
POSITIVE LOGITS
росто
0.07
casinos
0.07
ว
0.07
Emails
0.07
[..
0.06
orative
0.06
�
0.06
Wide
0.06
...↵
0.06
oward
0.06
Activations Density 0.001%