Tim Wescott <seemywebsite@myfooter.really> writes:> [...] > It would only be an advantage if you're doing something high volume and > it makes for a cheaper bill of materials. I'm just tossing out > suggestions: it's your job to evaluate them for fitness.Gotcha. I appreciate the suggestion, Tim. I should have stated this up front, but for right now this is just for my own pleasure and tinkering. The application (if it ever really materializes) is a "preamp" for my Crown amp and Klipschorns. -- Randy Yates, DSP/Embedded Firmware Developer Digital Signal Labs http://www.digitalsignallabs.com
suggestions on 32-bit dsp
Started by ●April 8, 2016
Reply by ●April 11, 20162016-04-11
Reply by ●April 11, 20162016-04-11
On Mon, 11 Apr 2016 18:16:14 +0000, Eric Jacobsen wrote:> Both the Pi and the BBB platforms are very cheap and very capable. I'm > wondering how long dedicated DSP processors will still have a market.I wonder that, too. I have a personal project I'm working on, the processor for which is an ARM Cortex M4 machine, because of the DSP instructions. Of course, I haven't actually TRIED to make that part work yet -- it's a spare time project, and I haven't had a lot of spare time. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Reply by ●April 11, 20162016-04-11
On 11.4.16 22:05, Tim Wescott wrote:> On Mon, 11 Apr 2016 18:16:14 +0000, Eric Jacobsen wrote: > >> Both the Pi and the BBB platforms are very cheap and very capable. I'm >> wondering how long dedicated DSP processors will still have a market. > > I wonder that, too. I have a personal project I'm working on, the > processor for which is an ARM Cortex M4 machine, because of the DSP > instructions. > > Of course, I haven't actually TRIED to make that part work yet -- it's a > spare time project, and I haven't had a lot of spare time.The Cortex DSP instructions work well. The main difference to a real DSP comes from the shuttling of operands to the MAC. A DSP pre-fetches next operands during a MAC, but a Cortec needs two ldr instructions to get the operands in. -- -TV
Reply by ●April 12, 20162016-04-12
On Monday, April 11, 2016 at 12:08:47 PM UTC-4, Randy Yates wrote:> robert bristow-johnson <rbj@audioimagination.com> writes: > > > On Sunday, April 10, 2016 at 9:16:42 PM UTC-4, Randy Yates wrote: > >> robert bristow-johnson <rbj@audioimagination.com> writes: > >> > > ... > > > if your DSP is doing coefficient calculation, it might be easier with > > float than fixed. > > Why not utilize the best of both worlds?well, i sorta thought that was *my* point.> If you're doing coefficient > calculation, and/or other math functions, at a low rate, then of course > do that with floats. But by the time you get cranking out the > convolutions, do that with fixed-point.how 'bout fast convolutions with an FFT? rather do *that* in fixed?> I.e., somewhere along the way, convert your resulting coefficients to fixed-point. >well, in the same foreground (or whatever low rate) process when you calculate your coefficients using floating-point math, division, and calls to functions like cos(), whatever process does that can also scale and convert to fixed-point format. (and, really to do things right, you have to ping-pong the new coefficients, so you're always writing them into the "inactive" ping-pong buffer. that way none of the asychronicity matters a twit.) converting to fixed is just part of the coefficient update thingie.> > if i were a silicon guy, i would design an audio DSP strictly without > > floating point. i would probably repeat most of the decisions that Bob > > Adams did with the Sigma DSP (hey, that's an idea for you, but it's > > still ADI). he makes the words Q4.28 (or whatever notation you like) > > and does exactly the right thing with the least significant word after > > a multiply or MAC. > > But then wouldn't that introduce more error, instead of doing the whole > group of MACs for a sample, THEN requantize?so the range of the fixed-point numbers are from -8.000000 to +7.999999 . the accumulator has an extra 28 bits on the right and i think 16 extra bits on the left, i think. there's no loss of information until the wide word is written back out and it deals with rounding and saturation about the same as a 56K (i.e. the logical way, i think Bob had really good judgement on that).> > problem is with the Sigma is that it's too much like an FPGA or some > > hardware solution. no branch instruction or conditional branch > > instruction. so you can't really process samples in blocks and you > > can't have processing modes like you need to efficiently do processing > > alongside analysis. like what happens in a pitch shifter. > > I've heard of that device but didn't know all this about it. I still > prefer the cozy familiarity of a DSP.well, it *is* a DSP, it's just that all of the instructions are inline.> > > so, i still agree in principle that, for a given word width, fixed > > beats IEEE float (with 8 exponent bits) in audio. and that's because > > we don't need 40 dB of headroom. 6 or 12 dB is enough. then 32-bit > > fixed leaves 30 bits for your signal whereas the mantissa (plus > > "hidden 1 bit" plus sign bit) in IEEE-754 has 5 fewer bits. > > > > but if you wanna do a realtime audio product with a DSP, i think the > > easiest thing to do is just do it with a SHArC. especially if you need > > to do transcendental math, say, to calculate coefficients, you'll have > > fewer headaches with floating point. > > Again, use the best of both worlds, eh?isn't that what using a processor that does both 32-bit fixed and 40-bit float is? r b-j
Reply by ●April 12, 20162016-04-12
On Monday, April 11, 2016 at 2:17:00 AM UTC-4, rickman wrote:> On 4/10/2016 11:43 PM, robert bristow-johnson wrote: > > > > if i were a silicon guy, i would design an audio DSP strictly without > > floating point....> > problem is with the Sigma is that it's > > too much like an FPGA or some hardware solution. no branch > > instruction or conditional branch instruction. so you can't really > > process samples in blocks and you can't have processing modes like > > you need to efficiently do processing alongside analysis. like what > > happens in a pitch shifter. > > Hmmm... not sure what limitations you think FPGAs have. I think maybe > you just don't understand them.i know you can make a state machine on an FPGA. i'm sure you can even implement a Dynamic Memory Controller, a Program Counter, Stack Pointer, arithmetic registers (like accumulators), ConditionCode/Status Register, and decode opcodes with an FPGA. i'm sure you can implement conditional and unconditional JUMP and JSR instructions with this FPGA programmed to be a CPU. i'm sure you can do that. but i have developed algs on an ASIC and an FPGA that behaved sorta like the ASIC. like the Sigma, all of the instructions were inline and executed the list from top instruction to the bottom exactly once per sample. none of the hardware designers felt it needed to be any different. sometimes folks just can't see past their assumptions like the only way to process samples is one sample at a time. and i dunno zilch about Verilog or VHDL or whatever hardware language. r b-j
Reply by ●April 12, 20162016-04-12
On 4/11/2016 11:41 PM, robert bristow-johnson wrote:> On Monday, April 11, 2016 at 2:17:00 AM UTC-4, rickman wrote: >> On 4/10/2016 11:43 PM, robert bristow-johnson wrote: >>> >>> if i were a silicon guy, i would design an audio DSP strictly >>> without floating point. > .... >>> problem is with the Sigma is that it's too much like an FPGA or >>> some hardware solution. no branch instruction or conditional >>> branch instruction. so you can't really process samples in >>> blocks and you can't have processing modes like you need to >>> efficiently do processing alongside analysis. like what happens >>> in a pitch shifter. >> >> Hmmm... not sure what limitations you think FPGAs have. I think >> maybe you just don't understand them. > > i know you can make a state machine on an FPGA. i'm sure you can > even implement a Dynamic Memory Controller, a Program Counter, Stack > Pointer, arithmetic registers (like accumulators), > ConditionCode/Status Register, and decode opcodes with an FPGA. i'm > sure you can implement conditional and unconditional JUMP and JSR > instructions with this FPGA programmed to be a CPU. i'm sure you can > do that. > > but i have developed algs on an ASIC and an FPGA that behaved sorta > like the ASIC. like the Sigma, all of the instructions were inline > and executed the list from top instruction to the bottom exactly once > per sample. none of the hardware designers felt it needed to be any > different. sometimes folks just can't see past their assumptions > like the only way to process samples is one sample at a time. > > and i dunno zilch about Verilog or VHDL or whatever hardware > language.I'm still now following your point. The fact that someone else designed hardware that processed samples one at a time doesn't mean it was that way because of limitations in the hardware. If you want to process a buffer of data, for example to perform an FFT, you can allocate a buffer and collect a full block of data, then perform the FFT on it. No magic, no special features required. Do what you want. I suspect the hardware you've seen before was the way it was because that is easier to code and it suited the application. But it is far from hard to process data in buffers, you just need the buffers. HDL is not really so much different from sequentially executed software. Some aspects of HDL are sequential. That generates hardware that runs together (I refrain from saying sequentially because it is often optimized so that it ends up being one block of logic). If you want hardware to operate in parallel that can be done easily. Any of the parallel constructs can be used to create separate blocks of hardware to run in parallel. This isn't really a good way to picture HDL though. I always think in terms of the hardware I wish to have and then describe that in HDL. Others just code the algorithm. But either way, you are not limited to what you can do. The main difference is that software running on a processor has limitations due to the fact that there is no real parallelism. Parallelism on a sequential processor is emulated by time multiplexing the processor between different tasks. Hardware is not about the HDL. It's just hardware and HDL lets you specify your design a bit more easily. -- Rick
Reply by ●April 12, 20162016-04-12
Hi Tim, W dniu 2016-04-09 o 05:54, Tim Wescott pisze: (...)> Basically, I know that the Cortex-A7 has the instructions, and goes fast > enough to do what you want. Whether it's the right choice is for you to > cypher out.have a look at Paul Beckmann from DSP Concepts presentation comparing SHARC and Blackfin signal processors with ARMs Cortex-M and - A. http://www.dspconcepts.com/sites/default/files/white-papers/PD8_Beckmann.pdf Best regards Roman
Reply by ●April 12, 20162016-04-12
Roman Rumian <rumian_usun_to@agh.edu.pl> writes:> Hi Tim, > > W dniu 2016-04-09 o 05:54, Tim Wescott pisze: > (...) >> Basically, I know that the Cortex-A7 has the instructions, and goes fast >> enough to do what you want. Whether it's the right choice is for you to >> cypher out. > > have a look at Paul Beckmann from DSP Concepts presentation comparing > SHARC and Blackfin signal processors with ARMs Cortex-M and - A. > > http://www.dspconcepts.com/sites/default/files/white-papers/PD8_Beckmann.pdf > > Best regards > > RomanThanks Roman - very useful! -- Randy Yates, DSP/Embedded Firmware Developer Digital Signal Labs http://www.digitalsignallabs.com
Reply by ●April 12, 20162016-04-12
On Mon, 11 Apr 2016 22:45:37 +0300, Tauno Voipio <tauno.voipio@notused.fi.invalid> wrote:>On 11.4.16 22:05, Tim Wescott wrote: >> On Mon, 11 Apr 2016 18:16:14 +0000, Eric Jacobsen wrote: >> >>> Both the Pi and the BBB platforms are very cheap and very capable. I'm >>> wondering how long dedicated DSP processors will still have a market. >> >> I wonder that, too. I have a personal project I'm working on, the >> processor for which is an ARM Cortex M4 machine, because of the DSP >> instructions. >> >> Of course, I haven't actually TRIED to make that part work yet -- it's a >> spare time project, and I haven't had a lot of spare time. > > >The Cortex DSP instructions work well. The main difference to >a real DSP comes from the shuttling of operands to the MAC. >A DSP pre-fetches next operands during a MAC, but a Cortec needs >two ldr instructions to get the operands in.And that's still pretty efficient, especially when the core is running at a higher clock rate than a competing DSP, or you have four of them on a die instead of one. I think some DSPs may allow multiple MACs to execute concurrently, which helps. The economy of scale seems to favor the general purpose processors, and I wonder if the market for dedicated DSPs will stay large enough to keep it viable. I also wonder how much of ADI's revenue is from the DSPs...?
Reply by ●April 12, 20162016-04-12
On Tue, 12 Apr 2016 11:55:16 +0200, Roman Rumian <rumian_usun_to@agh.edu.pl> wrote:>Hi Tim, > >W dniu 2016-04-09 o 05:54, Tim Wescott pisze: >(...) >> Basically, I know that the Cortex-A7 has the instructions, and goes fast >> enough to do what you want. Whether it's the right choice is for you to >> cypher out. > >have a look at Paul Beckmann from DSP Concepts presentation comparing >SHARC and Blackfin signal processors with ARMs Cortex-M and - A. > >http://www.dspconcepts.com/sites/default/files/white-papers/PD8_Beckmann.pdf > >Best regards > >RomanThat makes the point pretty well.






