INDEX
    Explanations

    Reasoning and explanations

    inconsistencies between summaries and the original document's factual content.

    This neuron fires on discourse markers and numerals used in step-by-step explanations (for example, words like “Therefore” or numeric tokens) indicating logical or quantitative reasoning steps.

    New Auto-Interp
    Negative Logits
     Cs
    -0.07
     darauf
    -0.07
     распрост
    -0.07
     Related
    -0.06
     meshes
    -0.06
    LCD
    -0.06
    ären
    -0.06
    Band
    -0.06
    آم
    -0.06
    ales
    -0.06
    POSITIVE LOGITS
     possível
    0.06
     Grass
    0.06
    JI
    0.06
    0.06
    INTER
    0.06
    ених
    0.06
     dynamic
    0.06
     pige
    0.06
    одав
    0.06
     lightly
    0.06
    Act Density 0.057%

    No Known Activations