INDEX
    Explanations

    The neuron activates on contrastive discourse markers—words like “but” or “however” that introduce a counterpoint or exception.

    New Auto-Interp
    Negative Logits
    blem
    -0.07
    Weekly
    -0.07
    aphore
    -0.07
    erti
    -0.06
    ่าย
    -0.06
     AssertionError
    -0.06
     Laurie
    -0.06
    _imm
    -0.06
    Problem
    -0.06
    یل
    -0.06
    POSITIVE LOGITS
     AA
    0.08
    зд
    0.07
     pierced
    0.06
     Trends
    0.06
    _AN
    0.06
    rotate
    0.06
    S
    0.06
    PEND
    0.06
     Крас
    0.06
    0.06
    Act Density 0.033%

    No Known Activations