Reply by December 11, 20152015-12-11
On Wednesday, December 9, 2015 at 9:28:23 AM UTC+13, Steve Pope wrote:
> <gyansorova@gmail.com> wrote: > > >> On 12/8/2015 12:53 AM, gyansorova@gmail.com wrote: > > >>> I was wondering how many of you when sampling signals use > >>> two concurrent loops. First loop to do the sampling and the > >>> second for the DSP. You send the data between the two with > >>> a FIFO. I assume this is way faster than a straight through > >>> single loop. Of course FPGAs are ok for this but unaware if > >>> anybody does this on say two processors. > > >Two parallel while loops which run independantly of each other (not on > >the same processor) > > You can certainly do this. A major complication would be if you > had to send data in both directions. If it's one direction only > (from the first processor to the second processor) than the > second processor can simply poll the first processor. > > If things get too hairy you would either need to write a sychronizaiton > kernel, or buy (or I suppose, write from scratch) an RTOS. Depending on > your total project flow, using an RTOS right off the bat might > be wise. > > > Steve
I was on a course by National Instruments and it appears they have solved all these issues on their compact Rio. It has FPGA and a RTOS and they can communicate through FIFOs and multiple software loops can also communicate. Some of it is quite easy but it appears to be quite challenging to understand the more exotic side of it. They can monitor the jitter in real even and adjust speeds. Very neat stuff but not at all cheap. One of them is controlling the Haldron Collider apparently. Wouldn't be much use for some applications of course, too expensive.
Reply by Bob Masta December 9, 20152015-12-09
On Mon, 7 Dec 2015 21:53:47 -0800 (PST),
gyansorova@gmail.com wrote:

>I was wondering how many of you when sampling signals use two concurrent lo= >ops. First loop to do the sampling and the second for the DSP. You send the= > data between the two with a FIFO. I assume this is way faster than a strai= >ght through single loop. Of course FPGAs are ok for this but unaware if any= >body does this on say two processors.
I suspect that these days most everyone is using some form of "two concurrent loops" in the sense that the sampling is treated as a separate process that fills a buffer, and the DSP is handled by a separate process that reads the buffer. Whether these run on separate processors, or separate tasks on a single processor, isn't important to the basic concept. Either way you need some handshaking to tell when a buffer is full and needs servicing, whether that is passed as an interrupt, DMA request, or software message. The choice will probably depend on "hard" time constraints versus memory use. You can ride out long intervals between buffer services with a large-enough queue of buffers, for example. (Or one really large circular buffer with pointer access.) Or am I misunderstanding the question? Best regards, Bob Masta DAQARTA v8.00 Data AcQuisition And Real-Time Analysis www.daqarta.com Scope, Spectrum, Spectrogram, Sound Level Meter Frequency Counter, Pitch Track, Pitch-to-MIDI FREE 8-channel Signal Generator, DaqMusiq generator Science with your sound card!
Reply by glen herrmannsfeldt December 9, 20152015-12-09
rickman <gnuarm@gmail.com> wrote:

(snip on two processor serial pipelines for DSP)
(then I wrote)

>> I believe that more usual is to do operations in parallel on >> the two machines. That is, not in pipeline form.
(snip)
>> I suspect one could divide up the FFT, again so that the two >> processors are doing the same thing two different parts of the data.
> The FFT does not split cleanly like this. All output data depends on > all input data and the intermediate calculations are shared. You could > split the data into two parallel paths with a final step to combine them > and perform the final pass.
I didn't try too hard, but that is about what I was thinking about.
> This would not scale well with more > processors requiring more passes to be done at the end for every > multiple of 2 processors used (or other numbers depending on the radix > of the butterflies used).
But the OP only asked about two, and didn't give details about the algorithm in question.
> Rather the FFT is best decomposed by passes with the data flowing > between each pass like a pipeline. So this is not a good example to > demonstrate your point.
I agree it doesn't scale to larger number of processors so well. For a serial pipeline, you need an efficient way to pass data. If you don't have that, the overhead will be worse than the speedup. Though you should be able to partition a 2D FFT, by rows and then columns, without much trouble. In case anyone was interested, the paper describing the algorithm used for the Lytro camera starts out with a 4D FFT. That is as far as I remember it, though. Fortunately N isn't so big. -- glen
Reply by Steve Pope December 8, 20152015-12-08
glen herrmannsfeldt  <gah@ugcs.caltech.edu> wrote:

>Steve Pope <spope33@speedymail.org> wrote:
>> <gyansorova@gmail.com> wrote:
>>>> On 12/8/2015 12:53 AM, gyansorova@gmail.com wrote:
>>>>> I was wondering how many of you when sampling signals use >>>>> two concurrent loops. First loop to do the sampling and the >>>>> second for the DSP. You send the data between the two with >>>>> a FIFO. I assume this is way faster than a straight through >>>>> single loop. Of course FPGAs are ok for this but unaware if >>>>> anybody does this on say two processors.
>>>Two parallel while loops which run independantly of each other (not on >>>the same processor)
>> You can certainly do this. A major complication would be if you >> had to send data in both directions. If it's one direction only >> (from the first processor to the second processor) than the >> second processor can simply poll the first processor.
>I believe that more usual is to do operations in parallel on >the two machines. That is, not in pipeline form.
Just to clarify, I did not intend to exclude parallel processing in my description above. It sounds like the OP wishes to do some front-end processing in the first processor and some more extensive DSP in the second processor. These can occur simultaneously. The stated use of "while loops" suggests polling, that is, whenever the second processor has run out of data, it polls the first processor to see if more data is ready. This is less efficient than having the appropriate I/O primitives built into the OS (something like a select() system call is useful) and supported by the hardware, but it still allows parallelism. Steve
Reply by rickman December 8, 20152015-12-08
On 12/8/2015 6:38 PM, glen herrmannsfeldt wrote:
> Steve Pope <spope33@speedymail.org> wrote: >> <gyansorova@gmail.com> wrote: > >>>> On 12/8/2015 12:53 AM, gyansorova@gmail.com wrote: > >>>>> I was wondering how many of you when sampling signals use >>>>> two concurrent loops. First loop to do the sampling and the >>>>> second for the DSP. You send the data between the two with >>>>> a FIFO. I assume this is way faster than a straight through >>>>> single loop. Of course FPGAs are ok for this but unaware if >>>>> anybody does this on say two processors. > >>> Two parallel while loops which run independantly of each other (not on >>> the same processor) > >> You can certainly do this. A major complication would be if you >> had to send data in both directions. If it's one direction only >> (from the first processor to the second processor) than the >> second processor can simply poll the first processor. > > I believe that more usual is to do operations in parallel on > the two machines. That is, not in pipeline form. > >> If things get too hairy you would either need to write a sychronizaiton >> kernel, or buy (or I suppose, write from scratch) an RTOS. Depending on >> your total project flow, using an RTOS right off the bat might >> be wise. > > One might do a matrix multiply with one processor doing one half of > the matrix, and the other doing the other half. In that case, there > is little synchronization to get right. > > I suspect one could divide up the FFT, again so that the two > processors are doing the same thing two different parts of the data.
The FFT does not split cleanly like this. All output data depends on all input data and the intermediate calculations are shared. You could split the data into two parallel paths with a final step to combine them and perform the final pass. This would not scale well with more processors requiring more passes to be done at the end for every multiple of 2 processors used (or other numbers depending on the radix of the butterflies used). Rather the FFT is best decomposed by passes with the data flowing between each pass like a pipeline. So this is not a good example to demonstrate your point. -- Rick
Reply by glen herrmannsfeldt December 8, 20152015-12-08
Steve Pope <spope33@speedymail.org> wrote:
> <gyansorova@gmail.com> wrote:
>>> On 12/8/2015 12:53 AM, gyansorova@gmail.com wrote:
>>>> I was wondering how many of you when sampling signals use >>>> two concurrent loops. First loop to do the sampling and the >>>> second for the DSP. You send the data between the two with >>>> a FIFO. I assume this is way faster than a straight through >>>> single loop. Of course FPGAs are ok for this but unaware if >>>> anybody does this on say two processors.
>>Two parallel while loops which run independantly of each other (not on >>the same processor)
> You can certainly do this. A major complication would be if you > had to send data in both directions. If it's one direction only > (from the first processor to the second processor) than the > second processor can simply poll the first processor.
I believe that more usual is to do operations in parallel on the two machines. That is, not in pipeline form.
> If things get too hairy you would either need to write a sychronizaiton > kernel, or buy (or I suppose, write from scratch) an RTOS. Depending on > your total project flow, using an RTOS right off the bat might > be wise.
One might do a matrix multiply with one processor doing one half of the matrix, and the other doing the other half. In that case, there is little synchronization to get right. I suspect one could divide up the FFT, again so that the two processors are doing the same thing two different parts of the data. For an FIR, if you wait until enough data is in the input buffer, you can have one compute the even output samples, and the other odd. Doing it this way, easily generalizes to more processors. Not so easy in a serial pipeline form. -- glen
Reply by Eric Jacobsen December 8, 20152015-12-08
On Tue, 8 Dec 2015 08:26:05 -0800 (PST), gyansorova@gmail.com wrote:

>On Tuesday, December 8, 2015 at 8:54:04 PM UTC+13, rickman wrote: >> On 12/8/2015 12:53 AM, gyansorova@gmail.com wrote: >> > I was wondering how many of you when sampling signals use two concurren= >t loops. First loop to do the sampling and the second for the DSP. You send= > the data between the two with a FIFO. I assume this is way faster than a s= >traight through single loop. Of course FPGAs are ok for this but unaware if= > anybody does this on say two processors. >>=20 >> I guess I'm not clear on what you mean by one or two loops. Do you mean= >=20 >> processes? >>=20 >> --=20 >>=20 >> Rick > >Two parallel while loops which run independantly of each other (not on the = >same processor)
I'm actually in the middle of doing this sort of thing right now on a vanilla Linux platform where the real-time application is subject to the whims of the OS. Using threads (e.g., Posix threads) helps a lot, even if there's only one CPU, but if there is more than one CPU it allows the system to distribute the various processes (aka threads) across the available CPUs. Even with just one CPU it can prevent hardware drivers or interface routines from blocking (and sitting on their hands) while time is wasting away that you need to do the processing in the other loop (or other thread). In my current case it's similar to what you're describing: the routine that collects the samples from the input (ADC, AFE, whatever), gets it's own time management and thread, which periodically hands stuff off to another thread that is running compute-intensive processing. It is essentially virtual parallel while() loops, even on one CPU, as the OS allows resources to switch between them rather than allowing one to block the other. If multiple CPUs are available, then the threads can be assigned to CPU resources based on availability. Eric Jacobsen Anchor Hill Communications http://www.anchorhill.com
Reply by Steve Pope December 8, 20152015-12-08
<gyansorova@gmail.com> wrote:

>> On 12/8/2015 12:53 AM, gyansorova@gmail.com wrote:
>>> I was wondering how many of you when sampling signals use >>> two concurrent loops. First loop to do the sampling and the >>> second for the DSP. You send the data between the two with >>> a FIFO. I assume this is way faster than a straight through >>> single loop. Of course FPGAs are ok for this but unaware if >>> anybody does this on say two processors.
>Two parallel while loops which run independantly of each other (not on >the same processor)
You can certainly do this. A major complication would be if you had to send data in both directions. If it's one direction only (from the first processor to the second processor) than the second processor can simply poll the first processor. If things get too hairy you would either need to write a sychronizaiton kernel, or buy (or I suppose, write from scratch) an RTOS. Depending on your total project flow, using an RTOS right off the bat might be wise. Steve
Reply by Tim Wescott December 8, 20152015-12-08
On Mon, 07 Dec 2015 21:53:47 -0800, gyansorova wrote:

> I was wondering how many of you when sampling signals use two concurrent > loops. First loop to do the sampling and the second for the DSP. You > send the data between the two with a FIFO. I assume this is way faster > than a straight through single loop. Of course FPGAs are ok for this but > unaware if anybody does this on say two processors.
Given the wide range of answers you're getting, and the puzzlement that accompanies them -- perhaps you could tell us more, or even what your real question is? -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Reply by Les Cargill December 8, 20152015-12-08
gyansorova@gmail.com wrote:
> I was wondering how many of you when sampling signals use two > concurrent loops. First loop to do the sampling and the second for > the DSP. You send the data between the two with a FIFO. I assume this > is way faster than a straight through single loop. Of course FPGAs > are ok for this but unaware if anybody does this on say two > processors. >
Results will vary. There is also nothing to keep you from interleaving the two "loops" in one say, "while (1) ..." loop for a single processor. "Two processors" is slightly ambiguous; could be a dual-core in which case some things are still shared that you'll have to accommodate. You kinda want the sampling part to be something like DMA. Much also depends on the interface to the sampling hardware. A 'loop' could be behind a device driver facade. The way to know what works is to instrument the code with at least counters that are incremented to show when something is not right. -- Les Cargill