INDEX
    Explanations

    tokens that mark structured conversational format elements such as speaker roles (user, assistant) and response channel selections (analysis, final, commentary).

    New Auto-Interp
    Negative Logits
     PV
    -0.12
     RM
    -0.12
     AE
    -0.11
     FV
    -0.11
    PV
    -0.11
     NM
    -0.11
    RM
    -0.11
    BM
    -0.11
    Merc
    -0.11
    NM
    -0.10
    POSITIVE LOGITS
    <|channel|>
    0.41
    <|message|>
    0.32
    <|constrain|>
    0.28
    <|start|>
    0.18
    婷婷
    0.16
    0.16
    <|end|>
    0.16
    <|call|>
    0.15
    琪琪
    0.15
     ASUS
    0.15
    Act Density 0.591%

    No Known Activations