INDEX
Explanations
The neuron activates on the special control tokens that mark conversation structure (e.g. “<|start_header_id|>”, “<|end_header_id|>”, speaker tags, and other header/footer markers).
New Auto-Interp
Negative Logits
(cls
-0.07
[offset
-0.07
Triple
-0.07
Ě
-0.07
уп
-0.06
lopen
-0.06
_game
-0.06
(conf
-0.06
Authorities
-0.06
LIMIT
-0.06
POSITIVE LOGITS
smlou
0.07
践
0.06
sen
0.06
/place
0.06
準
0.06
-US
0.06
and
0.06
uteur
0.06
HashCode
0.06
nást
0.06
Activations Density 0.096%