INDEX
    Explanations

    This neuron detects explicit user or system directives telling the model to maintain or resume a particular role or “stay in character.”

    New Auto-Interp
    Negative Logits
     laundering
    -0.07
     forcibly
    -0.07
     besoin
    -0.06
    ):↵↵
    -0.06
     участ
    -0.06
    _ln
    -0.06
    /INFO
    -0.06
    /Library
    -0.06
     Goals
    -0.06
    .finish
    -0.06
    POSITIVE LOGITS
    crease
    0.07
     scaler
    0.07
     comparison
    0.06
    Listening
    0.06
    0.06
    /us
    0.06
    0.06
     WT
    0.06
     expectations
    0.06
     खबर
    0.06
    Act Density 0.003%

    No Known Activations