INDEX
    Explanations

    The neuron flags the special “chain‐of‐thought” or reasoning control tokens (e.g. “Thought,” “Action,” “Observation”) in the model’s internal transcript.

    New Auto-Interp
    Negative Logits
     ảnh
    -0.06
     vamp
    -0.06
    -0.06
    σια
    -0.06
     Seas
    -0.06
    trip
    -0.06
    .micro
    -0.06
    ذه
    -0.06
    ấn
    -0.06
    leitung
    -0.06
    POSITIVE LOGITS
    росто
    0.07
     casinos
    0.07
    0.07
     Emails
    0.07
    [..
    0.06
    orative
    0.06
    0.06
    Wide
    0.06
    ...↵
    0.06
    oward
    0.06
    Act Density 0.001%

    No Known Activations