INDEX
    Explanations

    words or tokens related to programming, technical terms, or conversational roles within code or instruction-like contexts.

    instructions about how to analyze, process, or structure responses to user queries.

    chat-style conversation scaffolding, especially role markers, prompt/instruction meta text, and assistant reply boilerplate within multi-turn dialogues

    references to specific test strings or identifiers (particularly "davidjl") being analyzed or manipulated in conversational exchanges.

    New Auto-Interp
    Negative Logits
    iap
    -0.08
    atient
    -0.07
    😳
    -0.07
    eng
    -0.07
    事を
    -0.07
    正是因为
    -0.07
    Concat
    -0.07
    jt
    -0.07
    ماذا
    -0.07
    .LayoutControlItem
    -0.07
    POSITIVE LOGITS
     giải
    0.08
    _of
    0.08
    _expr
    0.08
     продук
    0.07
    submission
    0.07
    _SPLIT
    0.07
    老婆
    0.07
    .div
    0.07
    ся
    0.07
     пара
    0.07
    Act Density 42.091%

    No Known Activations