INDEX
    Explanations

    Explanations and reasons

    The neuron selectively activates on the question word “why,” essentially detecting when an explanation (“why”) is being prompted.

    New Auto-Interp
    Negative Logits
    elez
    -0.07
    Это
    -0.06
     television
    -0.06
    GMEM
    -0.06
    _threads
    -0.06
    GREEN
    -0.06
    _STOP
    -0.06
     mission
    -0.06
    Second
    -0.06
     persuade
    -0.06
    POSITIVE LOGITS
     hydr
    0.08
    .getModel
    0.07
     OSP
    0.07
     dành
    0.07
     ensured
    0.07
    _ADMIN
    0.06
     Ralph
    0.06
     principle
    0.06
    (fi
    0.06
    ]=[
    0.06
    Act Density 0.016%

    No Known Activations