INDEX
    Explanations

    code snippets

    The neuron is detecting Python code tokens (i.e. the parts inside code blocks) rather than narrative text.

    New Auto-Interp
    Negative Logits
    ovie
    -0.07
    EZ
    -0.07
     evaluations
    -0.06
    .movie
    -0.06
     confidential
    -0.06
    ords
    -0.06
    -X
    -0.06
    なく
    -0.06
    InternalEnumerator
    -0.06
    _videos
    -0.06
    POSITIVE LOGITS
     Seeing
    0.07
    ılığ
    0.07
    0.07
    .You
    0.07
    .There
    0.06
     propagate
    0.06
    0.06
     calming
    0.06
     zun
    0.06
    withstanding
    0.06
    Act Density 0.041%

    No Known Activations