INDEX
Explanations
question answering
This neuron activates on causal‐explanation phrasing—i.e. words in “because/does…this…to/be…(adjective)” style reason clauses.
New Auto-Interp
Negative Logits
--*/↵
-0.08
Tables
-0.07
bred
-0.06
"""↵
-0.06
porn
-0.06
"""↵↵
-0.06
↵
-0.06
Table
-0.06
******/
-0.06
statues
-0.06
POSITIVE LOGITS
Reserved
0.07
urch
0.07
zdraví
0.06
$options
0.06
_qp
0.06
tightly
0.06
olig
0.06
fy
0.06
vailable
0.06
öt
0.06
Activations Density 0.006%