INDEX
    Explanations

    Question-answering

    The neuron flags tokens in the user‐provided “behavior” example sentences (i.e. the actual scenario description) rather than the surrounding instructions or choices.

    New Auto-Interp
    Negative Logits
    acles
    -0.07
    IPLE
    -0.07
     UNIX
    -0.07
     homicide
    -0.06
     لن
    -0.06
     LIVE
    -0.06
     Unix
    -0.06
    aptic
    -0.06
    .Ui
    -0.06
    epochs
    -0.06
    POSITIVE LOGITS
    .scrollView
    0.06
     частини
    0.06
     ViewChild
    0.06
    _far
    0.06
    -guid
    0.06
     Chavez
    0.06
     продукт
    0.06
    orns
    0.06
     Dumbledore
    0.06
     downstream
    0.05
    Act Density 0.011%

    No Known Activations