INDEX
    Explanations

    The neuron is activating for words related to steps or instructions represented by words like "Next"

    instances of the word "Next."

    New Auto-Interp
    Negative Logits
     Feldman
    -0.75
     sexes
    -0.67
     brim
    -0.66
     Sinai
    -0.64
    utic
    -0.60
    iveness
    -0.60
     Ãĸ
    -0.60
     Erie
    -0.59
    ted
    -0.59
     heterogeneity
    -0.59
    POSITIVE LOGITS
    ĻĤ
    0.80
    Next
    0.78
    door
    0.76
    Scene
    0.73
     millenn
    0.72
     installment
    0.69
    ļéĨĴ
    0.69
     week
    0.68
    Phase
    0.68
    phase
    0.68
    Act Density 0.035%

    No Known Activations