INDEX
    Explanations

    This neuron detects the formal discourse markers of academic papers—words like “We,” “Our,” “The,” “goal,” and “in this setting” that introduce problem statements, assumptions, and main contributions.

    New Auto-Interp
    Negative Logits
     đứng
    -0.07
    fait
    -0.07
    ADB
    -0.06
    Tx
    -0.06
    flash
    -0.06
    อร
    -0.06
    dv
    -0.06
    .put
    -0.06
     이러
    -0.06
    "]],↵
    -0.06
    POSITIVE LOGITS
     podnik
    0.07
    _sex
    0.06
     versatility
    0.06
     мужчин
    0.06
    _parser
    0.06
    ной
    0.06
     ihrer
    0.06
     Project
    0.06
    .squeeze
    0.06
    (dir
    0.06
    Act Density 0.019%

    No Known Activations