DSPRelated.com
Forums

How to downsample this signal?

Started by Piotr Wyderski May 13, 2017
On 15.05.2017 1:42, Piotr Wyderski wrote:
> BTW, the processor is not as sledgehammer as it may sound
Then, have you considered a cascade of halfband low-pass FIR filters? You would still need more multiplies per input sample than with IIR filters, but yet it's a very nice option. Also, implementing a halfband decimating low-pass FIR filter is the easiest possible way to introduce yourself to polyphase techniques. Read some fred harris or Lyon's "Understanding DSP", and you are fine. Gene
On Saturday, May 13, 2017 at 3:23:24 AM UTC-4, Piotr Wyderski wrote:
> Hello, > > I have the 50Hz mains voltage signal sampled at 12bit/100kHz. This high > frequency is for rapid detection of overvoltage conditions. However, for > further processing this sample rate is way too high, so I'd like to > change the sampling rate by a factor ~100 (not a very critical value, > just "sensibly low"). > >
then the upper limit of transient frequencies that you see will be limited to about 500 Hz. if you want to "see" transients with frequencies above 500 Hz, maybe its better to let them alias and see something is there ...depends on what you are trying to do. m
Piotr Wyderski <peter.pan@neverland.mil> writes:

> Thank you all for your input. > > Randy Yates wrote: > >> What I don't understand is why, with the sledgehammer processor >> you have, you don't do a decent polyphase filter with a good >> FIR? > > It is a hobby project and I had 15 years of break with any form > of DSP, at least understood as *signal* processing. I believe > I still understand the math behind it, but I know only a limited > number of "tricks". Polyphase representation is totally new to > me, so are its applications to decimation/interpolation. > > The FIR filter has many advantages, e.g. maintaining linear phase > will be easy, but the "direct" approach implied a FIR of insanely > high order, so I didn't pursue the idea and drifted towards IIR, > with which I had some positive experience, especially with CICs > used in my SDR project. > > As far as I can see, the polyphase FIR approach should be the right > choice for me. I am still not sure whether I am able to design a > filter in this topology correctly, but well, it's what we call > "learning". Thank you for bringing it to my attention.
Hey, you are most welcome. "Polyphase filtering" sounds really complicated, but (like many things in this field) it's just a term for something quite simple: don't compute values you're just going to throw away. For example, if you are decimating by 8, you ultimately throw away 7 out of every 8 samples. So if you're throwing them away, don't compute them in the first place. Duh! Armed with this technique, you essentially increase the effective number of filter taps by M, where M is the decimation factor. So if you could afforst 200 taps at full rate, you could actually afford 8*200 = 1600 taps when decimating by 8.
> > BTW, the processor is not as sledgehammer as it may sound, especially > due to the very limited RAM resources available to the DSP unit (2 banks > of 128 24-bit words each, period.). The main processor (an ARM) has > everything needed to the job, except of low latency guarantees, which > is enough to kick it out of the scene. Whatever I can do, I must do > using the DFB block. Currently the unknown is the definition of > "whatever", but I'm aiming at saturating the block.
Hmm. Well then you may be up against a wall. Also if you have a requirement for low-latency, a long FIR is out. But note that even though CICs can be considered IIR, they also can be considered a simple boxcar averager so they will have N/2 latency for a length-N boxcar. I think there are ways to compute polyphase IIRs too but I've never done that. PS: Rick has some good material in his book on polyphase filtering, CIC decimators, etc. @BOOK{lyonsthird, title = "{Understanding Digital Signal Processing}", edition = "third", author = "{Richard~G.~Lyons}", publisher = "Prentice Hall", year = "2011"} Good luck! -- Randy Yates, Embedded Firmware Developer Garner Underground, Inc. 866-260-9040, x3901 http://www.garnerundergroundinc.com
On 5/15/2017 1:21 AM, Piotr Wyderski wrote:
> rickman wrote: > >> Are the 248 words for data or also program? > > Data only, there are dedicated code memories. The control part > is totally crazy: there are two separate code memories working > in parallel (they take both paths simultaneosly in order to > provide instructions with no delay in the case of a branch). > But there is no such thing as a program counter, the code > memories store instructions at consecutive addresses without > any deeper structure. A run of instructions is called a block > and is ended with the presence of a jump instruction. There > is also a third control memory which stores the information > where the i-th block begins and to which j-th state the FSM > should go in the case of a branch. In short, hardware basic > blocks. The data path is a VLIW with exposed pipelining, > which adds fun. > >> That should be plenty of room for coefficients. > > But doesn't the FIR require a lot of cells for > the past data values?
Like I said, I wrote this a long time ago, so I am not the resource to be asking. I can't picture how it works, but I specifically remember NOT needing to store previous data inputs. I am thinking you store outputs which will be fewer because of the decimation. As the input data comes in the output samples can be built up and when one is complete it is outputted and replaced by a new one. But read a proper reference. If I wasn't busy today I'd dig this up for you. -- Rick C
rickman <gnuarm@gmail.com> writes:

> On 5/15/2017 1:21 AM, Piotr Wyderski wrote: >> rickman wrote: >> >>> Are the 248 words for data or also program? >> >> Data only, there are dedicated code memories. The control part >> is totally crazy: there are two separate code memories working >> in parallel (they take both paths simultaneosly in order to >> provide instructions with no delay in the case of a branch). >> But there is no such thing as a program counter, the code >> memories store instructions at consecutive addresses without >> any deeper structure. A run of instructions is called a block >> and is ended with the presence of a jump instruction. There >> is also a third control memory which stores the information >> where the i-th block begins and to which j-th state the FSM >> should go in the case of a branch. In short, hardware basic >> blocks. The data path is a VLIW with exposed pipelining, >> which adds fun. >> >>> That should be plenty of room for coefficients. >> >> But doesn't the FIR require a lot of cells for >> the past data values? > > Like I said, I wrote this a long time ago, so I am not the resource to > be asking. I can't picture how it works, but I specifically remember > NOT needing to store previous data inputs. I am thinking you store > outputs which will be fewer because of the decimation. As the input > data comes in the output samples can be built up and when one is > complete it is outputted and replaced by a new one. > > But read a proper reference. If I wasn't busy today I'd dig this up > for you.
Yes you do need to store N-1 previous inputs (plus 1 current input) for a length N polyphase FIR decimation by M. Just think first principles: for any output y[k], you need N inputs to compute it, even if k = n*M. However, you don't have to input and shift every input as you do for a normal-rate FIR. -- Randy Yates, Embedded Firmware Developer Garner Underground, Inc. 866-260-9040, x3901 http://www.garnerundergroundinc.com
makolber@yahoo.com wrote:

> then the upper limit of transient frequencies that you see will be limited to about 500 Hz.
No, the range checking will be performed on the high bandwidth stream, as a part of sample inception procedure. I care about U(t), not about dU/dt. Then this stream should be decimated for less demanding tasks. Best regards, Piotr
Randy Yates <randyy@garnerundergroundinc.com> writes:

> rickman <gnuarm@gmail.com> writes: > >> On 5/15/2017 1:21 AM, Piotr Wyderski wrote: >>> rickman wrote: >>> >>>> Are the 248 words for data or also program? >>> >>> Data only, there are dedicated code memories. The control part >>> is totally crazy: there are two separate code memories working >>> in parallel (they take both paths simultaneosly in order to >>> provide instructions with no delay in the case of a branch). >>> But there is no such thing as a program counter, the code >>> memories store instructions at consecutive addresses without >>> any deeper structure. A run of instructions is called a block >>> and is ended with the presence of a jump instruction. There >>> is also a third control memory which stores the information >>> where the i-th block begins and to which j-th state the FSM >>> should go in the case of a branch. In short, hardware basic >>> blocks. The data path is a VLIW with exposed pipelining, >>> which adds fun. >>> >>>> That should be plenty of room for coefficients. >>> >>> But doesn't the FIR require a lot of cells for >>> the past data values? >> >> Like I said, I wrote this a long time ago, so I am not the resource to >> be asking. I can't picture how it works, but I specifically remember >> NOT needing to store previous data inputs. I am thinking you store >> outputs which will be fewer because of the decimation. As the input >> data comes in the output samples can be built up and when one is >> complete it is outputted and replaced by a new one. >> >> But read a proper reference. If I wasn't busy today I'd dig this up >> for you. > > Yes you do need to store N-1 previous inputs (plus 1 current input) for > a length N polyphase FIR decimation by M. Just think first principles: > for any output y[k], you need N inputs to compute it, even if k = n*M.
Correction: even if n = k*M, or k = floor(n/M).
> > However, you don't have to input and shift every input as you do for a > normal-rate FIR.
-- Randy Yates, Embedded Firmware Developer Garner Underground, Inc. 866-260-9040, x3901 http://www.garnerundergroundinc.com
On 5/15/2017 11:49 AM, Randy Yates wrote:
> rickman <gnuarm@gmail.com> writes: > >> On 5/15/2017 1:21 AM, Piotr Wyderski wrote: >>> rickman wrote: >>> >>>> Are the 248 words for data or also program? >>> >>> Data only, there are dedicated code memories. The control part >>> is totally crazy: there are two separate code memories working >>> in parallel (they take both paths simultaneosly in order to >>> provide instructions with no delay in the case of a branch). >>> But there is no such thing as a program counter, the code >>> memories store instructions at consecutive addresses without >>> any deeper structure. A run of instructions is called a block >>> and is ended with the presence of a jump instruction. There >>> is also a third control memory which stores the information >>> where the i-th block begins and to which j-th state the FSM >>> should go in the case of a branch. In short, hardware basic >>> blocks. The data path is a VLIW with exposed pipelining, >>> which adds fun. >>> >>>> That should be plenty of room for coefficients. >>> >>> But doesn't the FIR require a lot of cells for >>> the past data values? >> >> Like I said, I wrote this a long time ago, so I am not the resource to >> be asking. I can't picture how it works, but I specifically remember >> NOT needing to store previous data inputs. I am thinking you store >> outputs which will be fewer because of the decimation. As the input >> data comes in the output samples can be built up and when one is >> complete it is outputted and replaced by a new one. >> >> But read a proper reference. If I wasn't busy today I'd dig this up >> for you. > > Yes you do need to store N-1 previous inputs (plus 1 current input) for > a length N polyphase FIR decimation by M. Just think first principles: > for any output y[k], you need N inputs to compute it, even if k = n*M.
I am pretty sure that conclusion is not correct. When I was asked to implement a polyphase filter that was the part I had trouble getting my head around but finally understood how it worked. At the time this was being done on a very early DSP chip with very limited memory, so storing the N-1 previous samples was not an option. The guy presenting the problem to me was the classic engineer who could read and understand things, but couldn't explain anything to other people. So I had to take the paper he gave me and finally figure it out for myself. I will dig around when I get a chance and figure out exactly how this worked. But the basic idea is that instead of saving multiple inputs and calculating the outputs one at a time (iterating over the stored inputs) - as each input comes in it is iterated over the stored outputs and each output is ... well output, when it has been fully calculated. Since there are fewer outputs than inputs (by the decimation factor of M) you store fewer outputs than you would inputs. This is not hard to understand if you just stop thinking it *has* to be the way you are currently seeing it. (shown here with simplified notation since it is too messy to show the indexes with decimation) y(k) = a(0)x(i) + a(1)x(i-1) + a(2)x(i-2) + ... y(k+1) = a(0)x(i+1) + a(1)x(i) + a(2)x(i-1) + ... y(k+2) = a(0)x(i+2) + a(1)x(i+1) + a(2)x(i) + ... y(k+3) = a(0)x(i+3) + a(1)x(i+2) + a(2)x(i+1) + ... Instead of thinking you have to calculate all the terms of y(k) at one time (and so store the last N*M values of x), consider that when x(i) arrives, you can calculate the x(i) terms and add them into y(k) (which is then output) and y(k+1), y(k+2)... This allows the storage of fewer values by a factor of M. It does require more memory fetches than the standard MAC calculation since you have to fetch not only the coefficient and the input, but also the intermediate output value being calculated on each MAC operation. Maybe I won't have to dig it up. I'm pretty sure this is how it worked. It ties in nicely with the "polyphase" aspect to allow a minimum number of calculations on arrival of each input since not all the coefficients are used on each input. So the coefficients are divided into "phases" (N/M coefficients) with only one phase used for any given input sample. -- Rick C
On Tue, 16 May 2017 02:59:51 -0400, rickman <gnuarm@gmail.com> wrote:

>On 5/15/2017 11:49 AM, Randy Yates wrote: >> rickman <gnuarm@gmail.com> writes: >> >>> On 5/15/2017 1:21 AM, Piotr Wyderski wrote: >>>> rickman wrote: >>>> >>>>> Are the 248 words for data or also program? >>>> >>>> Data only, there are dedicated code memories. The control part >>>> is totally crazy: there are two separate code memories working >>>> in parallel (they take both paths simultaneosly in order to >>>> provide instructions with no delay in the case of a branch). >>>> But there is no such thing as a program counter, the code >>>> memories store instructions at consecutive addresses without >>>> any deeper structure. A run of instructions is called a block >>>> and is ended with the presence of a jump instruction. There >>>> is also a third control memory which stores the information >>>> where the i-th block begins and to which j-th state the FSM >>>> should go in the case of a branch. In short, hardware basic >>>> blocks. The data path is a VLIW with exposed pipelining, >>>> which adds fun. >>>> >>>>> That should be plenty of room for coefficients. >>>> >>>> But doesn't the FIR require a lot of cells for >>>> the past data values? >>> >>> Like I said, I wrote this a long time ago, so I am not the resource to >>> be asking. I can't picture how it works, but I specifically remember >>> NOT needing to store previous data inputs. I am thinking you store >>> outputs which will be fewer because of the decimation. As the input >>> data comes in the output samples can be built up and when one is >>> complete it is outputted and replaced by a new one. >>> >>> But read a proper reference. If I wasn't busy today I'd dig this up >>> for you. >> >> Yes you do need to store N-1 previous inputs (plus 1 current input) for >> a length N polyphase FIR decimation by M. Just think first principles: >> for any output y[k], you need N inputs to compute it, even if k = n*M. > >I am pretty sure that conclusion is not correct. When I was asked to >implement a polyphase filter that was the part I had trouble getting my >head around but finally understood how it worked. At the time this was >being done on a very early DSP chip with very limited memory, so storing >the N-1 previous samples was not an option. The guy presenting the >problem to me was the classic engineer who could read and understand >things, but couldn't explain anything to other people. So I had to take >the paper he gave me and finally figure it out for myself. > >I will dig around when I get a chance and figure out exactly how this >worked. But the basic idea is that instead of saving multiple inputs >and calculating the outputs one at a time (iterating over the stored >inputs) - as each input comes in it is iterated over the stored outputs >and each output is ... well output, when it has been fully calculated. >Since there are fewer outputs than inputs (by the decimation factor of >M) you store fewer outputs than you would inputs. > >This is not hard to understand if you just stop thinking it *has* to be >the way you are currently seeing it. (shown here with simplified >notation since it is too messy to show the indexes with decimation) > >y(k) = a(0)x(i) + a(1)x(i-1) + a(2)x(i-2) + ... >y(k+1) = a(0)x(i+1) + a(1)x(i) + a(2)x(i-1) + ... >y(k+2) = a(0)x(i+2) + a(1)x(i+1) + a(2)x(i) + ... >y(k+3) = a(0)x(i+3) + a(1)x(i+2) + a(2)x(i+1) + ... > >Instead of thinking you have to calculate all the terms of y(k) at one >time (and so store the last N*M values of x), consider that when x(i) >arrives, you can calculate the x(i) terms and add them into y(k) (which >is then output) and y(k+1), y(k+2)... This allows the storage of fewer >values by a factor of M. It does require more memory fetches than the >standard MAC calculation since you have to fetch not only the >coefficient and the input, but also the intermediate output value being >calculated on each MAC operation. > >Maybe I won't have to dig it up. I'm pretty sure this is how it worked. > It ties in nicely with the "polyphase" aspect to allow a minimum >number of calculations on arrival of each input since not all the >coefficients are used on each input. So the coefficients are divided >into "phases" (N/M coefficients) with only one phase used for any given >input sample. > >-- > >Rick C
You either need to store N-1 previous input samples or you need enough running partial product accumulators with the proper coefficients selected at each input. I usually do the latter, but either works. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
rickman <gnuarm@gmail.com> writes:

> On 5/15/2017 11:49 AM, Randy Yates wrote: >> rickman <gnuarm@gmail.com> writes: >> >>> On 5/15/2017 1:21 AM, Piotr Wyderski wrote: >>>> rickman wrote: >>>> >>>>> Are the 248 words for data or also program? >>>> >>>> Data only, there are dedicated code memories. The control part >>>> is totally crazy: there are two separate code memories working >>>> in parallel (they take both paths simultaneosly in order to >>>> provide instructions with no delay in the case of a branch). >>>> But there is no such thing as a program counter, the code >>>> memories store instructions at consecutive addresses without >>>> any deeper structure. A run of instructions is called a block >>>> and is ended with the presence of a jump instruction. There >>>> is also a third control memory which stores the information >>>> where the i-th block begins and to which j-th state the FSM >>>> should go in the case of a branch. In short, hardware basic >>>> blocks. The data path is a VLIW with exposed pipelining, >>>> which adds fun. >>>> >>>>> That should be plenty of room for coefficients. >>>> >>>> But doesn't the FIR require a lot of cells for >>>> the past data values? >>> >>> Like I said, I wrote this a long time ago, so I am not the resource to >>> be asking. I can't picture how it works, but I specifically remember >>> NOT needing to store previous data inputs. I am thinking you store >>> outputs which will be fewer because of the decimation. As the input >>> data comes in the output samples can be built up and when one is >>> complete it is outputted and replaced by a new one. >>> >>> But read a proper reference. If I wasn't busy today I'd dig this up >>> for you. >> >> Yes you do need to store N-1 previous inputs (plus 1 current input) for >> a length N polyphase FIR decimation by M. Just think first principles: >> for any output y[k], you need N inputs to compute it, even if k = n*M. > > I am pretty sure that conclusion is not correct. When I was asked to > implement a polyphase filter that was the part I had trouble getting > my head around but finally understood how it worked. At the time this > was being done on a very early DSP chip with very limited memory, so > storing the N-1 previous samples was not an option. The guy > presenting the problem to me was the classic engineer who could read > and understand things, but couldn't explain anything to other people. > So I had to take the paper he gave me and finally figure it out for > myself. > > I will dig around when I get a chance and figure out exactly how this > worked. But the basic idea is that instead of saving multiple inputs > and calculating the outputs one at a time (iterating over the stored > inputs) - as each input comes in it is iterated over the stored > outputs and each output is ... well output, when it has been fully > calculated. Since there are fewer outputs than inputs (by the > decimation factor of M) you store fewer outputs than you would inputs. > > This is not hard to understand if you just stop thinking it *has* to > be the way you are currently seeing it. (shown here with simplified > notation since it is too messy to show the indexes with decimation) > > y(k) = a(0)x(i) + a(1)x(i-1) + a(2)x(i-2) + ... > y(k+1) = a(0)x(i+1) + a(1)x(i) + a(2)x(i-1) + ... > y(k+2) = a(0)x(i+2) + a(1)x(i+1) + a(2)x(i) + ... > y(k+3) = a(0)x(i+3) + a(1)x(i+2) + a(2)x(i+1) + ... > > Instead of thinking you have to calculate all the terms of y(k) at one > time (and so store the last N*M values of x), consider that when x(i) > arrives, you can calculate the x(i) terms and add them into y(k) > (which is then output) and y(k+1), y(k+2)... This allows the storage > of fewer values by a factor of M. It does require more memory fetches > than the standard MAC calculation since you have to fetch not only the > coefficient and the input, but also the intermediate output value > being calculated on each MAC operation. > > Maybe I won't have to dig it up. I'm pretty sure this is how it > worked. It ties in nicely with the "polyphase" aspect to allow a > minimum number of calculations on arrival of each input since not all > the coefficients are used on each input. So the coefficients are > divided into "phases" (N/M coefficients) with only one phase used for > any given input sample.
Counter-example: Consider M = 8, N = 8 (8 coefficients and decimating by 8). Then each output y(k) depends on x(k*M + n), 0 < n < M - 1, and thus the sets of input values for different outputs are disjoint. -- Randy Yates, Embedded Firmware Developer Garner Underground, Inc. 866-260-9040, x3901 http://www.garnerundergroundinc.com