INDEX
    Explanations

    This neuron fires on tokens in the model’s “No, the summary is not factually consistent with the document.” response—i.e. it detects the explicit “No” negation and surrounding phrasing that rejects consistency.

    New Auto-Interp
    Negative Logits
     Closure
    -0.07
    ERY
    -0.07
    ery
    -0.06
    sharp
    -0.06
     це
    -0.06
     Antarctica
    -0.06
    -0.06
     Mitar
    -0.06
     minions
    -0.06
    -0.06
    POSITIVE LOGITS
     bergen
    0.07
    рис
    0.06
    (dec
    0.06
                     
    0.06
    ,SIGNAL
    0.06
                    
    0.06
    <div
    0.06
    (QWidget
    0.06
    .hl
    0.06
     footer
    0.06
    Act Density 0.033%

    No Known Activations