INDEX
    Explanations

    This neuron fires on tokens in the assistant’s informative, explanatory answer passages.

    New Auto-Interp
    Negative Logits
    650
    -0.07
     pup
    -0.07
    (common
    -0.07
     پرو
    -0.06
    -0.06
     acos
    -0.06
    -0.06
    otomy
    -0.06
     WA
    -0.06
    ِك
    -0.06
    POSITIVE LOGITS
    toHave
    0.06
     Vanity
    0.06
    =").
    0.06
    Highlighted
    0.06
     imageURL
    0.06
     petty
    0.06
    0.06
    .unsubscribe
    0.05
    ]")
    0.05
    repositories
    0.05
    Act Density 0.183%

    No Known Activations