Hi everyone, I'm an undergrad and I'd like to tackle the task of designing a DSP. I'm wondering if anyone can help point me in the right direction in terms of common considerations for DSP ISAs, considerations for dedicated hardware, best practices for MACs, etc. I've designed a few RISCs previously so I'm up to speed on the basics at least. My main concern is understanding what DSP programmers expect from their hardware. Any pointers or suggestions for texts would be greatly appreciated! Thanks
Texts on generic DSP architecture designs?
Started by ●March 2, 2012
Reply by ●March 2, 20122012-03-02
On 3/2/2012 7:28 AM, WaveRider wrote:> Hi everyone, > > I'm an undergrad and I'd like to tackle the task of designing a DSP. I'm > wondering if anyone can help point me in the right direction in terms of > common considerations for DSP ISAs, considerations for dedicated hardware, > best practices for MACs, etc. I've designed a few RISCs previously so I'm > up to speed on the basics at least. My main concern is understanding what > DSP programmers expect from their hardware. > > Any pointers or suggestions for texts would be greatly appreciated!First, recognize that DSPs are optimized for different niches, just as microcontrollers are. If I were undertaking your project, I would study the block diagrams and instruction sets of a few representative processors to learn at least what specific questions to ask. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●March 2, 20122012-03-02
On Fri, 02 Mar 2012 14:09:13 -0500, Jerry Avins wrote:> On 3/2/2012 7:28 AM, WaveRider wrote: >> Hi everyone, >> >> I'm an undergrad and I'd like to tackle the task of designing a DSP. >> I'm wondering if anyone can help point me in the right direction in >> terms of common considerations for DSP ISAs, considerations for >> dedicated hardware, best practices for MACs, etc. I've designed a few >> RISCs previously so I'm up to speed on the basics at least. My main >> concern is understanding what DSP programmers expect from their >> hardware. >> >> Any pointers or suggestions for texts would be greatly appreciated! > > First, recognize that DSPs are optimized for different niches, just as > microcontrollers are. If I were undertaking your project, I would study > the block diagrams and instruction sets of a few representative > processors to learn at least what specific questions to ask. > > JerryYup. To take the two that I'm most familiar with: The ADSP21xx DSP instruction set is very versatile, because they made the hardware looping explicit; this meant that you could do more than just a chain of MAC instructions way fast (most notably for me, you could do y = a0 + a1 * x + a2 * x^2 + a3 * x^3 + ... at two clocks per term -- one to square x, and one to do the MAC). But each register in the ADSP21xx has a different set of overlapping special purposes, so writing code for it was very much a puzzle-solving exercise: you would find yourself painted into a corner and have to go back half a dozen instructions, change a choice of register, then try again. Moreover, that nifty hardware looping construct used a hardware stack that was not accessible from software -- so you could interrupt a hardware loop, but you couldn't have an RTOS with independent tasks that did hardware looping (because there was no way to suck the context out of that stack, or to restore it). And, the versatility of the instruction set came at the cost of a strictly-enforced Harvard architecture, because the instruction bus was 24 bits wide while the data bus was 16 bits. You could access one from the other, but only awkwardly. The TMS320F2812 has a far less versatile and more traditional instruction set, but the entire processor context was visible and settable from software, and the instruction bus is 16 bits wide. It will still do a single-cycle MAC, but with severely restricted hardware looping (there's an instruction that basically says "do the next instruction N times" -- and the processor holds off interrupts until those N operations are done). However, within the context of this restricted (compared to the ADSP21xx) versatility to do fast math, the TMS320F2812 did a much better job of doing "regular old" processor tasks. Since the things that I was doing with these chips was basically "all that and DSP, too", the '2812 was a much better fit: in general, we were doing work where 95% of the lines of code was general-purpose code involved with an RTOS, talking to other processors, flipping bits on I/O ports, etc., and just 5% of the lines of code was doing the DSP necessary for motor control, video synchronizing, and other "DSP" things. But we still needed the DSP: those 5% of the lines of code would often be consuming 25% of the available processor time or more; trying to do the same tasks on a conventional processor would have been a disaster. My dream processor would be some melding of a RISC processor and a DSP, done in such a way that the following would hold: 1: single cycle vector dot-products (MAC, increment two index registers, access memory, check hardware loop, repeat). 2: universally available, interruptable context. 3: Fast context switches. I've dinked with how to do this, and I would probably try to make it happen by having a register file that was split between hard registers and a set of registers that would echo the top of the stack (I think there's a RISC processor that already does this -- SPARC?). Then I would make that 'shadowed' register file such that it holds all the context necessary for a MAC and whatever other cool things I could do. _And_ (keep in mind that I'm not a logic designer -- all this cool stuff is coming for free so far) I'd set it up so that the registers were automagically cached -- changing the stack pointer would mark all the registers for saving away and being replaced, but the actual writes and reads would initially only happen on an as-needed basis, to speed up context switches. -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com
Reply by ●March 3, 20122012-03-03
>But each register in the ADSP21xx has a different set of overlapping >special purposes, so writing code for it was very much a puzzle-solving >exercise: you would find yourself painted into a corner and have to go >back half a dozen instructions, change a choice of register, then try >again. > >Tim Wescott, Communications, Control, Circuits & Software >http://www.wescottdesign.comAn annoying aspect of those special purpose instructions is that historically vendors provided no information about how to use them at the time the core was released. For example, most cores have provided a way to do a subset of LMS adaption really quickly, but only much later did any app notes appear showing how to use them. Sure, you could fairly quickly spot that a certain instruction must have this special purpose, but figuring out how to get the best from it could take hours. What's wrong with some example code in the instruction set manual, guys? Steve
Reply by ●March 6, 20122012-03-06
steveu wrote:> >But each register in the ADSP21xx has a different set of overlapping > >special purposes, so writing code for it was very much a puzzle-solving > >exercise: you would find yourself painted into a corner and have to go > >back half a dozen instructions, change a choice of register, then try > >again. > > An annoying aspect of those special purpose instructions is that > historically vendors provided no information about how to use them at the > time the core was released.I have developed a lot of embedded systems compilers (28) and on many of them it was for new instruction sets or substantially revised instruction sets. The reason silicon vendors have so little information on their instruction set usage is they literally don't know how the instruction set can be best used in the context of full applications. Before the flames start it is a learning process for the software tools vendors as well. It takes about 300 design ins before the clear advantages and disadvantages of a new instruction set starts to clearly emerge. w.. -- Walter Banks Byte Craft Limited http://www.bytecraft.com
Reply by ●March 6, 20122012-03-06
On Mar 2, 2:59�pm, Tim Wescott <t...@seemywebsite.com> wrote:> On Fri, 02 Mar 2012 14:09:13 -0500, Jerry Avins wrote: > > On 3/2/2012 7:28 AM, WaveRider wrote: > >> Hi everyone, > > >> I'm an undergrad and I'd like to tackle the task of designing a DSP. > >> I'm wondering if anyone can help point me in the right direction in > >> terms of common considerations for DSP ISAs, considerations for > >> dedicated hardware, best practices for MACs, etc. I've designed a few > >> RISCs previously so I'm up to speed on the basics at least. My main > >> concern is understanding what DSP programmers expect from their > >> hardware. > > >> Any pointers or suggestions for texts would be greatly appreciated! > > > First, recognize that DSPs are optimized for different niches, just as > > microcontrollers are. If I were undertaking your project, I would study > > the block diagrams and instruction sets of a few representative > > processors to learn at least what specific questions to ask. > > > Jerry > > Yup. > > To take the two that I'm most familiar with: > > The ADSP21xx DSP instruction set is very versatile, because they made the > hardware looping explicit; this meant that you could do more than just a > chain of MAC instructions way fast (most notably for me, you could do > > y = a0 + a1 * x + a2 * x^2 + a3 * x^3 + ... > > at two clocks per term -- one to square x, and one to do the MAC). > > But each register in the ADSP21xx has a different set of overlapping > special purposes, so writing code for it was very much a puzzle-solving > exercise: you would find yourself painted into a corner and have to go > back half a dozen instructions, change a choice of register, then try > again. > > Moreover, that nifty hardware looping construct used a hardware stack > that was not accessible from software -- so you could interrupt a > hardware loop, but you couldn't have an RTOS with independent tasks that > did hardware looping (because there was no way to suck the context out of > that stack, or to restore it). > > And, the versatility of the instruction set came at the cost of a > strictly-enforced Harvard architecture, because the instruction bus was > 24 bits wide while the data bus was 16 bits. �You could access one from > the other, but only awkwardly. > > The TMS320F2812 has a far less versatile and more traditional instruction > set, but the entire processor context was visible and settable from > software, and the instruction bus is 16 bits wide. �It will still do a > single-cycle MAC, but with severely restricted hardware looping (there's > an instruction that basically says "do the next instruction N times" -- > and the processor holds off interrupts until those N operations are done). > > However, within the context of this restricted (compared to the ADSP21xx) > versatility to do fast math, the TMS320F2812 did a much better job of > doing "regular old" processor tasks. > > Since the things that I was doing with these chips was basically "all > that and DSP, too", the '2812 was a much better fit: in general, we were > doing work where 95% of the lines of code was general-purpose code > involved with an RTOS, talking to other processors, flipping bits on I/O > ports, etc., and just 5% of the lines of code was doing the DSP necessary > for motor control, video synchronizing, and other "DSP" things. > > But we still needed the DSP: those 5% of the lines of code would often be > consuming 25% of the available processor time or more; trying to do the > same tasks on a conventional processor would have been a disaster. > > My dream processor would be some melding of a RISC processor and a DSP, > done in such a way that the following would hold: > > 1: single cycle vector dot-products (MAC, increment two index registers, > access memory, check hardware loop, repeat). > > 2: universally available, interruptable context. > > 3: Fast context switches. > > I've dinked with how to do this, and I would probably try to make it > happen by having a register file that was split between hard registers > and a set of registers that would echo the top of the stack (I think > there's a RISC processor that already does this -- SPARC?). �Then I would > make that 'shadowed' register file such that it holds all the context > necessary for a MAC and whatever other cool things I could do. �_And_ > (keep in mind that I'm not a logic designer -- all this cool stuff is > coming for free so far) I'd set it up so that the registers were > automagically cached -- changing the stack pointer would mark all the > registers for saving away and being replaced, but the actual writes and > reads would initially only happen on an as-needed basis, to speed up > context switches. > > -- > My liberal friends think I'm a conservative kook. > My conservative friends think I'm a liberal kook. > Why am I not happy that they have found common ground? > > Tim Wescott, Communications, Control, Circuits & Softwarehttp://www.wescottdesign.comDon't they have that in chips which combine a DSP with a general purpose CPU on a die as separate entities? Then the CPU can do all your control stuff and the DSP can offload the DSP in an efficient manner. I assume interrupts only need to bother the CPU and not the DSP, unless the interrupt is from the CPU maybe. Wouldn't that do your job well? Otherwise consider what you might do with an array of small processors, each with a very small amount of working memory and high speed communications between so that they can each be treated as processing elements within a larger, highly flexible processor. In essence, you can microprogram your own ultimate DSP processor! Rick
Reply by ●March 6, 20122012-03-06
On 3/6/2012 6:16 AM, rickman wrote:> Don't they have that in chips which combine a DSP with a general > purpose CPU on a die as separate entities? Then the CPU can do all > your control stuff and the DSP can offload the DSP in an efficient > manner. I assume interrupts only need to bother the CPU and not the > DSP, unless the interrupt is from the CPU maybe. > > Wouldn't that do your job well? > > Otherwise consider what you might do with an array of small > processors, each with a very small amount of working memory and high > speed communications between so that they can each be treated as > processing elements within a larger, highly flexible processor. In > essence, you can microprogram your own ultimate DSP processor!Well, I've worked on designs that had just that. It was a nightmare from a software perspective. Here's why: Case 1 We were doing image processing. We had access to some really good image processing libraries that were written in C. But, there was no compiler for a heterogeneous array of cpus. I did work with some folks on such a thing but it never came to fruition and I don't know if such a thing exists in *any* context today. Obviously the "partitioning language" would be a big part of it. But, of course these chips worked and we did develop software for them. But it was all from scratch except for a library that had been done for that chip specifically. (TMS320C80) Case 2 I also worked on a 16-processor array of TMS320C30s or 40s (one board) that had two communication channels: - a high speed bus and - a serial communication mesh. I was *never* able to get the software guys to even address the recommended "rules" for using those communication channels. For example: control and interprocessor management on one and data on the other??? Nobody could think that far ahead. Can you imagine just letting software developers loose on a machine like that with no rules set by their gurus? (Not that there wouldn't be any "aha!s"). Unfortunately the hardware design proved too difficult and we never had the opportunity to program the real thing. Anyway, the point is that we tend to rely on reusable software and heterogeneous machines or machines with multiple communication channels present interesting problems that can't be handled with a compiler. Fred
Reply by ●March 6, 20122012-03-06
On Tue, 06 Mar 2012 06:16:06 -0800, rickman wrote:> Don't they have that in chips which combine a DSP with a general purpose > CPU on a die as separate entities? Then the CPU can do all your control > stuff and the DSP can offload the DSP in an efficient manner. I assume > interrupts only need to bother the CPU and not the DSP, unless the > interrupt is from the CPU maybe. > > Wouldn't that do your job well? > > Otherwise consider what you might do with an array of small processors, > each with a very small amount of working memory and high speed > communications between so that they can each be treated as processing > elements within a larger, highly flexible processor. In essence, you > can microprogram your own ultimate DSP processor! > > RickWe considered that, and rejected them. First, because you pay up the wazoo for chips like that. Second, because when you're designing a system that needs to last for decades, you want to avoid boutique parts like the plague -- using some brand-new DSP chip was bad enough (but it was from TI, who is at least fairly trustworthy). Third, as Mr. Marshall alluded to, developing software for multiple tightly-coupled CPUs is a bitch. Fortunately the part that we used (the TI TMS320F2812) was designed to address exactly the problem that we were solving (lots of GP code that executes seldom, a little bit of DSP code that executes a lot), and aside from some startlingly bone-headed compiler quirks it worked superlatively well. -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com
Reply by ●March 6, 20122012-03-06
On Tue, 06 Mar 2012 07:24:01 -0500, Walter Banks wrote:> steveu wrote: > >> >But each register in the ADSP21xx has a different set of overlapping >> >special purposes, so writing code for it was very much a >> >puzzle-solving exercise: you would find yourself painted into a corner >> >and have to go back half a dozen instructions, change a choice of >> >register, then try again. >> >> An annoying aspect of those special purpose instructions is that >> historically vendors provided no information about how to use them at >> the time the core was released. > > I have developed a lot of embedded systems compilers (28) and on many of > them it was for new instruction sets or substantially revised > instruction sets. The reason silicon vendors have so little information > on their instruction set usage is they literally don't know how the > instruction set can be best used in the context of full applications. > > Before the flames start it is a learning process for the software tools > vendors as well. It takes about 300 design ins before the clear > advantages and disadvantages of a new instruction set starts to clearly > emerge.In the case of the ADSP21xx processors, I could see how the register usage came about -- they were clearly saving precious silicon, and keeping their logic paths short, by doing what they did. As difficult as it was to program those things by hand, I couldn't imagine that writing a truly optimizing compiler for the thing would be anything but a nightmare, if you could even express the intended functionality in a GP language like C. The C virtual machine just doesn't address the needs of fixed-point DSP computations. And, indeed, the C compiler that we used for the 21xx basically worked by choosing a subset of the registers that could be treated as general- purpose registers, and simply not using any of the "DSP-ish" features of the part. So we did all of the general purpose stuff in C (probably very inefficiently), and the actual DSP happened in less than 100 lines of assembly code whose data structures were set up in C, and which were called from C routines. -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com
Reply by ●March 6, 20122012-03-06
> > The ADSP21xx DSP instruction set is very versatile, because they made the > hardware looping explicit; this meant that you could do more than just a > chain of MAC instructions way fast (most notably for me, you could do > > y = a0 + a1 * x + a2 * x^2 + a3 * x^3 + ... > > at two clocks per term -- one to square x, and one to do the MAC). >> > Tim Wescott, Communications, Control, Circuits & Softwarehttp://www.wescottdesign.comMaybe not the best example. One usually recommends evaluating polynomials with Horner's rule: y = a0 + (a1 + (a2 + a3*x) * x) * x It will need MPY and ADD not MPY and MAC. -- rgds






