INDEX
    Explanations

    questions and responses

    This neuron detects text produced by the assistant (assistant-role turns / assistant's replies and self-referential or corrective utterances).

    New Auto-Interp
    Negative Logits
    ܢ
    -0.07
    -0.07
    Viewport
    -0.07
    .models
    -0.07
    روم
    -0.07
     Woo
    -0.06
    ء
    -0.06
    חלב
    -0.06
     Pou
    -0.06
    е
    -0.06
    POSITIVE LOGITS
    oklyn
    0.08
    uição
    0.07
    Mom
    0.07
    )dealloc
    0.07
     Aboriginal
    0.07
    笼罩
    0.07
    auf
    0.07
    AtIndex
    0.07
     decisive
    0.07
    räg
    0.07
    Act Density 0.074%

    No Known Activations