INDEX
Explanations
The neuron activates on subword pieces that mark list or enumeration formatting (e.g. numbered bullets, asterisks, item‐start tokens).
New Auto-Interp
Negative Logits
cellar
-0.07
(in
-0.06
barley
-0.06
Split
-0.06
/design
-0.06
Hernandez
-0.06
bosses
-0.06
ervations
-0.06
cousin
-0.06
liability
-0.06
POSITIVE LOGITS
管
0.08
addListener
0.07
.findAll
0.06
sizlik
0.06
="'.
0.06
orWhere
0.06
ości
0.06
uming
0.06
енными
0.06
(!$
0.06
Activations Density 0.020%