INDEX
    Explanations

    instructions and explanations

    The neuron activates on mid‐frequency content words typical of explanatory answer sentences, signaling when detailed, informational language is being used.

    New Auto-Interp
    Negative Logits
     Could
    -0.07
    _stride
    -0.06
    	and
    -0.06
    chimp
    -0.06
    	throw
    -0.06
    	continue
    -0.06
    above
    -0.06
     neighbor
    -0.06
    [i
    -0.06
    }'",
    -0.06
    POSITIVE LOGITS
     shemale
    0.07
     Lima
    0.07
    Cls
    0.06
    _CONFIRM
    0.06
    .nextElement
    0.06
     перев
    0.06
    xaf
    0.06
     disadv
    0.06
    说话
    0.06
     RU
    0.06
    Act Density 0.027%

    No Known Activations