INDEX
    Explanations

    question answering

    This neuron activates on causal‐explanation phrasing—i.e. words in “because/does…this…to/be…(adjective)” style reason clauses.

    New Auto-Interp
    Negative Logits
    --*/↵
    -0.08
    Tables
    -0.07
     bred
    -0.06
     """↵
    -0.06
     porn
    -0.06
     """↵↵
    -0.06
    -0.06
    Table
    -0.06
    ******/
    -0.06
     statues
    -0.06
    POSITIVE LOGITS
     Reserved
    0.07
    urch
    0.07
     zdraví
    0.06
    $options
    0.06
    _qp
    0.06
     tightly
    0.06
     olig
    0.06
     fy
    0.06
    vailable
    0.06
    öt
    0.06
    Act Density 0.006%

    No Known Activations