INDEX
    Explanations

    Chat/forum snippets

    This neuron detects the assistant’s self-referential “I” (first-person) in its own messages.

    New Auto-Interp
    Negative Logits
    артам
    -0.07
    money
    -0.07
    地域
    -0.06
     biopsy
    -0.06
    Release
    -0.06
    Indiana
    -0.06
    ج
    -0.06
     Regions
    -0.06
     neue
    -0.06
     release
    -0.06
    POSITIVE LOGITS
     linux
    0.06
    Thus
    0.06
    ват
    0.06
     정신
    0.06
    Undefined
    0.06
    0.06
    .dst
    0.06
    vale
    0.06
    ọng
    0.05
    аниц
    0.05
    Act Density 0.063%

    No Known Activations