INDEX
    Explanations

    This neuron specifically activates on the conjunction “but,” highlighting contrastive or adversative “but” usages.

    New Auto-Interp
    Negative Logits
     madness
    -0.07
    -Con
    -0.06
     gef
    -0.06
    Circle
    -0.06
     од
    -0.06
     Came
    -0.06
     fax
    -0.06
    pants
    -0.06
    DATE
    -0.06
     STORY
    -0.06
    POSITIVE LOGITS
     Haj
    0.07
    Router
    0.06
     superheroes
    0.06
     εμπ
    0.06
    _bm
    0.06
     altered
    0.06
     binaries
    0.06
     Webb
    0.06
    ConstraintMaker
    0.06
     explorer
    0.06
    Act Density 0.059%

    No Known Activations