INDEX
Explanations
Explanations
This neuron activates on explanatory, pedagogical language—phrases that describe helping or showing how something works (e.g. “help us understand how,” “looks at how,” “can help us understand”).
New Auto-Interp
Negative Logits
إن
-0.07
discs
-0.07
awake
-0.07
Mana
-0.06
bg
-0.06
배
-0.06
Broad
-0.06
aside
-0.06
denote
-0.06
Atmos
-0.06
POSITIVE LOGITS
cdot
0.06
"],["
0.06
okableCall
0.06
owner
0.06
سود
0.06
ast
0.06
\Carbon
0.06
-symbol
0.05
_CODE
0.05
ième
0.05
Activations Density 0.192%