INDEX
    Explanations

    The neuron is triggered by the token “up” (and its occurrences as a standalone word or in “up-” prefixed words).

    New Auto-Interp
    Negative Logits
     Prote
    -0.08
    Nom
    -0.07
     Synthetic
    -0.07
     sterile
    -0.07
    sen
    -0.07
    errno
    -0.07
    Karen
    -0.07
    -0.06
    -0.06
    filename
    -0.06
    POSITIVE LOGITS
    (up
    0.07
    (update
    0.07
    Up
    0.07
     according
    0.07
     upd
    0.07
     up
    0.07
     fino
    0.07
     increased
    0.07
    ,只
    0.06
     [+
    0.06
    Act Density 0.021%

    No Known Activations