INDEX
    Explanations

    This neuron activates on the special metadata/control tokens (e.g. header-ID, begin/end markers, speaker tags) that structure the conversation rather than on ordinary content words.

    instructions that set covert goals for roleplay—steering the conversation subtly toward a hidden agenda or eliciting help/money without stating it directly.

    New Auto-Interp
    Negative Logits
    !?
    -0.07
    vanized
    -0.06
    Drivers
    -0.06
    .opendaylight
    -0.06
     Similar
    -0.06
     文件
    -0.06
     Ramirez
    -0.06
    apesh
    -0.05
     Sanity
    -0.05
     عبدال
    -0.05
    POSITIVE LOGITS
     Anglo
    0.07
    .pages
    0.07
    .intersection
    0.07
    ایش
    0.06
    	UN
    0.06
     panc
    0.06
    .window
    0.06
    icans
    0.06
    igma
    0.06
    .report
    0.06
    Act Density 0.028%

    No Known Activations