
Technical discussions about the TI C6000 DSPs (including the c62x, c64x and c67x DSPs).
|
hello everybody, could anyone tell me why there is no fir function to filter just one float sample at a time in any of ti's libraries? I have an algorithm that works on sample-by-sample basis and needs to FIR float data several times at different stages of the algorithm - so I need an optimized code for that FIR to meet the timings... as for now I'm writing that FIR in assembly myself, but I'm just wondering how come there are no functions of such type in TI's libraries? All I see is "block FIR" functions that calculate a multiple of 4 samples at a time - why is that? __________________________________ |
|
|
|
At 09:37 AM 9/15/2003, Wojciech Rewers wrote: >hello everybody, > >could anyone tell me why there is no fir function to >filter just one float sample at a time in any of ti's >libraries? > >I have an algorithm that works on sample-by-sample >basis and needs to FIR float data several times at >different stages of the algorithm - so I need an >optimized code for that FIR to meet the timings... > >as for now I'm writing that FIR in assembly myself, >but I'm just wondering how come there are no functions >of such type in TI's libraries? All I see is "block >FIR" functions that calculate a multiple of 4 samples >at a time - why is that? I have not looked at the code, but it is most likely because the code is optimized for the multiple MAC architecture. It is probably easier to calculate multiple output samples together rather than to use the parallel MACs to process one output sample. By calculating multiple output samples the coefficients can be fetched once for each group of outputs. I expect you will find that your code will be speed limited by the fetching of the input data and coefficients. I don't see how you can use more than one MAC. Let us know how it works out. Rick Collins Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX |
|
|
|
--- Arius - Rick Collins <> wrote: > I have not looked at the code, but it is most likely > because the code is > optimized for the multiple MAC architecture. It is > probably easier to > calculate multiple output samples together rather > than to use the parallel > MACs to process one output sample. By calculating > multiple output samples > the coefficients can be fetched once for each group > of outputs. > > I expect you will find that your code will be speed > limited by the fetching > of the input data and coefficients. I don't see how > you can use more than > one MAC. Let us know how it works out. well - Rick - I definitely agree with you in general - I understand that the architecture usage may be more efficient while calculating 4 samples "in parallel" comparing to calculating those 4 samples "in series"... however - what if one needs just one sample? I'm not expecting that calculating 1 instead of 4 will be 4 times faster, but I'd expect it could be 2 times faster - which could still help a lot... anyway - another person replied to my post directly to my e-mail and suggested to use the dotprod function from TI's library...that is a good point, but then - I'm still left with a problem of shifting the whole x vector after receiving each sample - right? and having a 256 coefs in my FIR - shifting those 256 samples takes significant time... are there any tricks for that? so I was thinking about combining those two processes... ultimately - what I need is a function like: float fir(float *x, int x_index, float *h, int nh); where I could just have one x vector addressed circularly and I'd be passing the index of the most recent sample in that vector that's what I'm trying to write right now - am I reinventing the wheel? __________________________________ |
|
|
|
At 11:10 AM 9/15/2003, Wojciech Rewers wrote: >--- Arius - Rick Collins <> wrote: > > I have not looked at the code, but it is most likely > > because the code is > > optimized for the multiple MAC architecture. It is > > probably easier to > > calculate multiple output samples together rather > > than to use the parallel > > MACs to process one output sample. By calculating > > multiple output samples > > the coefficients can be fetched once for each group > > of outputs. > > > > I expect you will find that your code will be speed > > limited by the fetching > > of the input data and coefficients. I don't see how > > you can use more than > > one MAC. Let us know how it works out. > >well - Rick - I definitely agree with you in general - >I understand that the architecture usage may be more >efficient while calculating 4 samples "in parallel" >comparing to calculating those 4 samples "in >series"... however - what if one needs just one >sample? I'm not expecting that calculating 1 instead >of 4 will be 4 times faster, but I'd expect it could >be 2 times faster - which could still help a lot... > >anyway - another person replied to my post directly to >my e-mail and suggested to use the dotprod function >from TI's library...that is a good point, but then - >I'm still left with a problem of shifting the whole x >vector after receiving each sample - right? and having >a 256 coefs in my FIR - shifting those 256 samples >takes significant time... are there any tricks for >that? > >so I was thinking about combining those two >processes... > >ultimately - what I need is a function like: > >float fir(float *x, int x_index, float *h, int nh); > >where I could just have one x vector addressed >circularly and I'd be passing the index of the most >recent sample in that vector that's what I'm trying to >write right now - am I reinventing the wheel? I am not familiar with the library code. But I do know that the type of circular buffer you need is not addressed the same as a standard buffer. So I think you will need to write your own code unless you can find a routine elsewhere. Surely you are not the first to need this. If you have source for the dot produce routine, it should be a simple matter to change the addressing mode for the x vector. Rick Collins Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX |
|
Wojciech-
> --- Arius - Rick Collins <> wrote: > > I have not looked at the code, but it is most likely > > because the code is > > optimized for the multiple MAC architecture. It is > > probably easier to > > calculate multiple output samples together rather > > than to use the parallel > > MACs to process one output sample. By calculating > > multiple output samples > > the coefficients can be fetched once for each group > > of outputs. > > > > I expect you will find that your code will be speed > > limited by the fetching > > of the input data and coefficients. I don't see how > > you can use more than > > one MAC. Let us know how it works out. > > well - Rick - I definitely agree with you in general - > I understand that the architecture usage may be more > efficient while calculating 4 samples "in parallel" > comparing to calculating those 4 samples "in > series"... however - what if one needs just one > sample? I'm not expecting that calculating 1 instead > of 4 will be 4 times faster, but I'd expect it could > be 2 times faster - which could still help a lot... > > anyway - another person replied to my post directly to > my e-mail and suggested to use the dotprod function > from TI's library...that is a good point, but then - > I'm still left with a problem of shifting the whole x > vector after receiving each sample - right? and having > a 256 coefs in my FIR - shifting those 256 samples > takes significant time... are there any tricks for > that? Try taking dot-product source code and convert to a wr_dotprod() function that make pointers circular w.r.t. to length and a "base pointer" (a new parameter). -Jeff |
|
I agree with everyones comments on why there is no sample by sample FIR in the TI library. But other companies and authors do have similar algorithms to offer. Rulph Chassaing's recent book on the C6711DSK has one and Integrated DSP has one on their website: http://www.integrated-dsp.com/software/c6xdsk/C6711/fira/ Good Luck... --- Wojciech Rewers <> wrote: > hello everybody, > > could anyone tell me why there is no fir function to > filter just one float sample at a time in any of > ti's > libraries? > > I have an algorithm that works on sample-by-sample > basis and needs to FIR float data several times at > different stages of the algorithm - so I need an > optimized code for that FIR to meet the timings... > > as for now I'm writing that FIR in assembly myself, > but I'm just wondering how come there are no > functions > of such type in TI's libraries? All I see is "block > FIR" functions that calculate a multiple of 4 > samples > at a time - why is that? > > __________________________________ > > _____________________________________ __________________________________ |
|
|
|
--- J G <> wrote: > I agree with everyones comments on why there is no > sample by sample FIR in the TI library. But other > companies and authors do have similar algorithms to > offer. Rulph Chassaing's recent book on the > C6711DSK > has one and Integrated DSP has one on their website: http://www.integrated-dsp.com/software/c6xdsk/C6711/fira/ > > Good Luck... thanks a lot! seems like right what I was looking for :-) __________________________________ |