INDEX
    Explanations

    This neuron activates on words used when asking for clarification or more information—tokens like “understand,” “better,” and “please” that signal a polite request for details.

    New Auto-Interp
    Negative Logits
    .sys
    -0.07
     ngOn
    -0.07
     startPos
    -0.06
     Гол
    -0.06
     Если
    -0.06
     commentary
    -0.06
    _storage
    -0.06
    >()↵↵
    -0.06
     vertically
    -0.06
     provisional
    -0.06
    POSITIVE LOGITS
     디자인
    0.07
     actividades
    0.07
    0.07
    ーズ
    0.07
    pair
    0.07
     illuminate
    0.07
     classes
    0.07
    _MT
    0.06
    0.06
    _ros
    0.06
    Act Density 0.014%

    No Known Activations