INDEX
    Explanations

    The neuron detects the “Yes/No” answer‐option tokens (including the slash) in the consistency‐checking prompt.

    New Auto-Interp
    Negative Logits
    ()))
    ↵
    -0.07
    ичес
    -0.06
     Qaeda
    -0.06
    enas
    -0.06
    -Qaeda
    -0.06
    ाइम
    -0.06
    	script
    -0.06
    _isr
    -0.06
    (problem
    -0.06
    ानत
    -0.06
    POSITIVE LOGITS
    _specific
    0.07
     annually
    0.07
    гар
    0.06
     RPG
    0.06
     VK
    0.06
     responding
    0.06
     fries
    0.06
     invariably
    0.06
     gồm
    0.06
    om
    0.06
    Act Density 0.002%

    No Known Activations