DSPRelated.com
Forums

Convolution

Started by Unknown February 6, 2006
Hi

Im currently in a bit of muddle.  Im currently doing an assignment (in
c on TMS320C6713 DSK) and we have been told clearly that we cannot use
FFT and we must use Convolution.

At this point im reading lots of web sites (TI.Com) etc and i have
found a convolution.c example program.  However when i read this
example they are implementing convolution whilst using FFT functions??

I have included the source code below, can somebody explain why they
have done this?



//FastConvo.c FIR filter implemented using overlap-add fast convolution

#include "DSK6713_AIC23.h"
Uint32 fs=DSK6713_AIC23_FREQ_8KHZ;

#include <math.h>
#include "coeffs.h"               //time domain FIR coefficients
#define PI 3.14159265358979
#define PTS 256                   //number of points for FFT
#define SQRT_PTS 16               //used in twiddle factor calc.
#define RADIX 2			    //passed to TI FFT routines
#define DELTA (2*PI)/PTS
typedef struct Complex_tag {float real, imag;} COMPLEX ;
#pragma DATA_ALIGN(W, sizeof(COMPLEX))
#pragma DATA_ALIGN(samples, sizeof(COMPLEX))
#pragma DATA_ALIGN(h, sizeof(COMPLEX))
COMPLEX W[PTS/RADIX] ;		    //twiddle factor array
COMPLEX samples[PTS];             //processing buffer
COMPLEX h[PTS];                   //FIR filter coefficients
short buffercount = 0;            //buffer count for iobuffer samples
float iobuffer[PTS/2];            //primary input/output buffer
float overlap[PTS/2];             //intermediate result buffer
short i;                          //index variable
short flag = 0;                   //set to indicate iobuffer full
float a, b;                       //variables used in complex multiply
short NUMCOEFFS = sizeof(coeffs)/sizeof(float);
short iTwid[SQRT_PTS] ;		    //PTS/2 + 1 > sqrt(PTS)

interrupt void c_int11(void)      //ISR
{
 output_sample((short)(iobuffer[buffercount]));
 iobuffer[buffercount++] = (float)((short)input_sample());
 if (buffercount >= PTS/2)        //for overlap-add method iobuffer
  {                               //is half size of FFT used
   buffercount = 0;
   flag = 1;
  }
}

main()
{                                 //set up array of twiddle factors
 digitrev_index(iTwid, PTS/RADIX, RADIX);
 for(i = 0 ; i < PTS/RADIX ; i++)
 {
   W[i].real = cos(DELTA*i);
   W[i].imag = sin(DELTA*i);
 }
 bitrev(W, iTwid, PTS/RADIX);	    //bit reverse order W

 for (i = 0 ; i<PTS ; i++)        //initialise PTS element
  {                               //of COMPLEX to hold real-valued
    h[i].real = 0.0;              //time domain FIR filter coefficients
    h[i].imag = 0.0;
  }
  for (i = 0 ; i < NUMCOEFFS ; i++)
  {                               //read FIR filter coeffs
    h[i].real = coeffs[i];        //NUMCOEFFS should be less than PTS/2
  }
  cfftr2_dit(h,W,PTS);            //transform filter coeffs
  comm_intr();                    //initialise DSK, codec, McBSP

  while(1)                        //frame processing infinite loop
   {
     while (flag == 0);           //wait for iobuffer full
       flag = 0;
     for (i = 0 ; i<PTS/2 ; i++)  //iobuffer into first half of
     {                            //samples buffer
      samples[i].real = iobuffer[i];
      iobuffer[i] = overlap[i];   //previously processed output
     }                            //to iobuffer
    for (i = 0 ; i<PTS/2 ; i++)
    {                             //second half of samples to overlap
      overlap[i] = samples[i+PTS/2].real;
      samples[i+PTS/2].real = 0.0;//zero-pad input from iobuffer
    }
    for (i=0 ; i<PTS ; i++)
      samples[i].imag = 0.0;      //init imag parts in samples buffer

    cfftr2_dit(samples,W,PTS);    //complex FFT function from TI

    for (i=0 ; i<PTS ; i++)       //frequency-domain representation
    {                             //complex multiply samples by h
      a = samples[i].real;
      b = samples[i].imag;
      samples[i].real = h[i].real*a - h[i].imag*b;
      samples[i].imag = h[i].real*b + h[i].imag*a;
    }

    icfftr2_dif(samples,W,PTS);   //inverse FFT function from TI

    for (i=0 ; i<PTS ; i++)
       samples[i].real /= PTS;
    for (i=0 ; i<PTS/2 ; i++)     //add first half of samples
       overlap[i] += samples[i].real; //to overlap
  }                               //end of while(1)
}                                 //end of main()

<tuurbo46@yahoo.co.uk> wrote in message
news:1139239180.395187.271000@o13g2000cwo.googlegroups.com...
> Hi > > Im currently in a bit of muddle. Im currently doing an assignment (in > c on TMS320C6713 DSK) and we have been told clearly that we cannot use > FFT and we must use Convolution. > > At this point im reading lots of web sites (TI.Com) etc and i have > found a convolution.c example program. However when i read this > example they are implementing convolution whilst using FFT functions??
Just because your assignment has some constraints doesn't mean others need to follow that too. This is a fast convolution technique that uses the FFT - this isn't very uncommon. Search for FIR example functions - that'll give you what you are looking for (convolution without using FFT). Cheers Bhaskar
> > I have included the source code below, can somebody explain why they > have done this? > > > > //FastConvo.c FIR filter implemented using overlap-add fast convolution > > #include "DSK6713_AIC23.h" > Uint32 fs=DSK6713_AIC23_FREQ_8KHZ; > > #include <math.h> > #include "coeffs.h" //time domain FIR coefficients > #define PI 3.14159265358979 > #define PTS 256 //number of points for FFT > #define SQRT_PTS 16 //used in twiddle factor calc. > #define RADIX 2 //passed to TI FFT routines > #define DELTA (2*PI)/PTS > typedef struct Complex_tag {float real, imag;} COMPLEX ; > #pragma DATA_ALIGN(W, sizeof(COMPLEX)) > #pragma DATA_ALIGN(samples, sizeof(COMPLEX)) > #pragma DATA_ALIGN(h, sizeof(COMPLEX)) > COMPLEX W[PTS/RADIX] ; //twiddle factor array > COMPLEX samples[PTS]; //processing buffer > COMPLEX h[PTS]; //FIR filter coefficients > short buffercount = 0; //buffer count for iobuffer samples > float iobuffer[PTS/2]; //primary input/output buffer > float overlap[PTS/2]; //intermediate result buffer > short i; //index variable > short flag = 0; //set to indicate iobuffer full > float a, b; //variables used in complex multiply > short NUMCOEFFS = sizeof(coeffs)/sizeof(float); > short iTwid[SQRT_PTS] ; //PTS/2 + 1 > sqrt(PTS) > > interrupt void c_int11(void) //ISR > { > output_sample((short)(iobuffer[buffercount])); > iobuffer[buffercount++] = (float)((short)input_sample()); > if (buffercount >= PTS/2) //for overlap-add method iobuffer > { //is half size of FFT used > buffercount = 0; > flag = 1; > } > } > > main() > { //set up array of twiddle factors > digitrev_index(iTwid, PTS/RADIX, RADIX); > for(i = 0 ; i < PTS/RADIX ; i++) > { > W[i].real = cos(DELTA*i); > W[i].imag = sin(DELTA*i); > } > bitrev(W, iTwid, PTS/RADIX); //bit reverse order W > > for (i = 0 ; i<PTS ; i++) //initialise PTS element > { //of COMPLEX to hold real-valued > h[i].real = 0.0; //time domain FIR filter coefficients > h[i].imag = 0.0; > } > for (i = 0 ; i < NUMCOEFFS ; i++) > { //read FIR filter coeffs > h[i].real = coeffs[i]; //NUMCOEFFS should be less than PTS/2 > } > cfftr2_dit(h,W,PTS); //transform filter coeffs > comm_intr(); //initialise DSK, codec, McBSP > > while(1) //frame processing infinite loop > { > while (flag == 0); //wait for iobuffer full > flag = 0; > for (i = 0 ; i<PTS/2 ; i++) //iobuffer into first half of > { //samples buffer > samples[i].real = iobuffer[i]; > iobuffer[i] = overlap[i]; //previously processed output > } //to iobuffer > for (i = 0 ; i<PTS/2 ; i++) > { //second half of samples to overlap > overlap[i] = samples[i+PTS/2].real; > samples[i+PTS/2].real = 0.0;//zero-pad input from iobuffer > } > for (i=0 ; i<PTS ; i++) > samples[i].imag = 0.0; //init imag parts in samples buffer > > cfftr2_dit(samples,W,PTS); //complex FFT function from TI > > for (i=0 ; i<PTS ; i++) //frequency-domain representation > { //complex multiply samples by h > a = samples[i].real; > b = samples[i].imag; > samples[i].real = h[i].real*a - h[i].imag*b; > samples[i].imag = h[i].real*b + h[i].imag*a; > } > > icfftr2_dif(samples,W,PTS); //inverse FFT function from TI > > for (i=0 ; i<PTS ; i++) > samples[i].real /= PTS; > for (i=0 ; i<PTS/2 ; i++) //add first half of samples > overlap[i] += samples[i].real; //to overlap > } //end of while(1) > } //end of main() >
Bhaskar Thiagarajan wrote:
> <tuurbo46@yahoo.co.uk> wrote in message > news:1139239180.395187.271000@o13g2000cwo.googlegroups.com... > >>Hi >> >>Im currently in a bit of muddle. Im currently doing an assignment (in >>c on TMS320C6713 DSK) and we have been told clearly that we cannot use >>FFT and we must use Convolution. >> >>At this point im reading lots of web sites (TI.Com) etc and i have >>found a convolution.c example program. However when i read this >>example they are implementing convolution whilst using FFT functions?? > > > Just because your assignment has some constraints doesn't mean others need > to follow that too. This is a fast convolution technique that uses the FFT - > this isn't very uncommon. > Search for FIR example functions - that'll give you what you are looking for > (convolution without using FFT). > > Cheers > Bhaskar
... Turbo46, I'm here to elaborate and to scold. I don't know who should be scolded, I presume it is the instructor. By the time one gets an assignment like yours, one should know that convolution in the time domain has the same effect as multiplication in the frequency domain (and vice versa). This means that it is possible to convolve in the time domain by using FFT to convert to frequency, multiplying, and then using IFFT co convert back to time. It may be surprising, but this procedure is faster than direct convolution for large enough sample sets. What books do you have access to? At your apparent level, I recommend Lyons. Understanding Digital Signal Processing http://www.amazon.com/gp/product/0131089897/sr=1-1/qid=1139254956/ref=pd_bbs_1/103-8646755-9878217?%5Fencoding=UTF8 Another good book, Smith, The Scientist and Engineer's Guide to Digital Signal Processing is available on line chapter by chapter in PDF at http://www.dspguide.com/. Chapter 16 deals with direct convolution, among other topics. Just as it is hard to manage a company whose business one doesn't understand, it is hard to write a good program to do something one doesn't understand. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
"Jerry Avins" <jya@ieee.org> wrote in message
news:nYGdnaaze4b5NHreRVn-ow@rcn.net...

> I'm here to elaborate and to scold. I don't know who should be scolded, > I presume it is the instructor. By the time one gets an assignment like > yours, one should know that convolution in the time domain has the same > effect as multiplication in the frequency domain (and vice versa). This > means that it is possible to convolve in the time domain by using FFT to > convert to frequency, multiplying, and then using IFFT co convert back > to time. It may be surprising, but this procedure is faster than direct > convolution for large enough sample sets.
There is another technique for computing a 2**n-point circular convolution that doesn't compute the 2**n-point DFT of the inputs and the inverse 2**n-point DFT to get the outputs. Perhaps that is what your professor was referring to. You can get a code generator at http://home.comcast.net/~kmbtib/conv2c.f90 . Since you probably don't have a Fortran compiler, you can get one quite easily at http://www.g95.org . The generated code, while in Fortran, can be converted to C by putting a semicolon at the end of each line and a few other minor changes. -- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end
James Van Buskirk wrote:

   ...

> There is another technique for computing a 2**n-point circular > convolution that doesn't compute the 2**n-point DFT of the > inputs and the inverse 2**n-point DFT to get the outputs. > Perhaps that is what your professor was referring to. You > can get a code generator at > > http://home.comcast.net/~kmbtib/conv2c.f90 .
That's a lot of code to read and understand. Is there a short explanation of how fast convolution is done without FFT etc.? Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
"Jerry Avins" <jya@ieee.org> wrote in message
news:GoCdnUIZUJmXUHrenZ2dnUVZ_sCdnZ2d@rcn.net...

> That's a lot of code to read and understand. Is there a short > explanation of how fast convolution is done without FFT etc.?
I didn't say it didn't use the FFT, just that it doesn't use a 2**k-point DFT to compute a 2**k-point circular convolution. The explanation hasn't been accepted for publication (yet). There are a couple of differences. Consider what the order-N real-half complex DFT matrix RF(N) does to an order-N circular convolution matrix C(N): C(N) = transpose(RF(N))*transpose(inverse(RF(N)))*C(N)* inverse(RF(N))*RF(N) The product of the central 3 factors above is permutation- equivalent to a 2-block-diagonal matrix (this is the content of the convolution theorem.) If RF(N) were replaced by RF(N/2).tensor.RF(2), the central 3 factors would now be permutation equivalent to a 4-block-diagonal matrix. The middle part is now slightly harder to compute, but in terms of total arithmetic operation counts the overall algorithm is slightly shorter. Additionally, using two-way SIMD is a snap when computing RF(N/2).tensor.RF(2), and not as easily conceived with RF(N). There is a further identity which can be considered to be the basis for split-radix algorithms: RF(N) = P(N,N/2)*(RF(N/2).DirectSum.OT(N/2))*(RF(2).tensor.identity(N/2)) the matrix OT(N/2) above need not be computed in its entirety because we can factor it into DS(N/2)*SOT(N/2) where DS(N/2) is diagonal and SOT(N/2) requires fewer arithmetic operations to compute than OT(N/2). Applying this concept reduces the total arithmetic operation count a little further. -- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end
"James Van Buskirk" <not_valid@comcast.net> wrote in message
news:XPOdnftiRZ48SnreRVn-tg@comcast.com...

> The product of the central 3 factors above is permutation- > equivalent to a 2-block-diagonal matrix (this is the content > of the convolution theorem.) If RF(N) were replaced by > RF(N/2).tensor.RF(2), the central 3 factors would now be > permutation equivalent to a 4-block-diagonal matrix. The middle > part is now slightly harder to compute, but in terms of > total arithmetic operation counts the overall algorithm is > slightly shorter. Additionally, using two-way SIMD is a > snap when computing RF(N/2).tensor.RF(2), and not as easily > conceived with RF(N).
Oh my, did I really say RF(N/2).tensor.RF(2) in the above paragraph? I should have said RF(N/2).tensor.identity(2) instead. Sorry. -- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end
Jerry Avins wrote:

..
> I'm here to elaborate and to scold. I don't know who should be scolded, > I presume it is the instructor. By the time one gets an assignment like > yours, one should know that convolution in the time domain has the same > effect as multiplication in the frequency domain (and vice versa). This > means that it is possible to convolve in the time domain by using FFT to > convert to frequency, multiplying, and then using IFFT co convert back > to time. It may be surprising, but this procedure is faster than direct > convolution for large enough sample sets. >
I can imagine a pedagogical purpose to the idea - how better to appreciate the performance of the FFT than to first implement direct convolution? And in any case, the dc algorithm is still appropriate for small N. Otherwise, if one simply accepts ex cathedra "FFT is better", one might never have a reason to learn how to code a dc. Its perhaps the equivalent of those torture tests apprentice engineers used to be given in "the old days" - take a piece of round rod, and square it up just using a file and hand measuring tools. Then file it round again, ditto. Not a test I ever took, nor could pass. Richard Dobson
Richard Dobson wrote:
> Jerry Avins wrote: > > .. > >> I'm here to elaborate and to scold. I don't know who should be >> scolded, I presume it is the instructor. By the time one gets an >> assignment like yours, one should know that convolution in the time >> domain has the same effect as multiplication in the frequency domain >> (and vice versa). This means that it is possible to convolve in the >> time domain by using FFT to convert to frequency, multiplying, and >> then using IFFT co convert back to time. It may be surprising, but >> this procedure is faster than direct convolution for large enough >> sample sets. >> > > I can imagine a pedagogical purpose to the idea - how better to > appreciate the performance of the FFT than to first implement direct > convolution? And in any case, the dc algorithm is still appropriate for > small N. Otherwise, if one simply accepts ex cathedra "FFT is better", > one might never have a reason to learn how to code a dc. > > Its perhaps the equivalent of those torture tests apprentice engineers > used to be given in "the old days" - take a piece of round rod, and > square it up just using a file and hand measuring tools. Then file it > round again, ditto. Not a test I ever took, nor could pass.
As an amateur silversmith, I make square wire from round with hammer ans anvil block. One of the most impressive bit of blacksmithing I ever saw was the transformation of a 1" square, 8" long rod into a 2" cube by hammering it only on its ends. Much depends on how it is heated in the forge and with a torch. My scolding has to do with the OP's not knowing what convolution is aside from a bit of code he can cut and paste. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Jerry Avins wrote:
> Richard Dobson wrote: > > Jerry Avins wrote: > > > > .. > > > >> I'm here to elaborate and to scold. I don't know who should be > >> scolded, I presume it is the instructor. By the time one gets an > >> assignment like yours, one should know that convolution in the time > >> domain has the same effect as multiplication in the frequency domain > >> (and vice versa). This means that it is possible to convolve in the > >> time domain by using FFT to convert to frequency, multiplying, and > >> then using IFFT co convert back to time. It may be surprising, but > >> this procedure is faster than direct convolution for large enough > >> sample sets. > >> > > > > I can imagine a pedagogical purpose to the idea - how better to > > appreciate the performance of the FFT than to first implement direct > > convolution? And in any case, the dc algorithm is still appropriate for > > small N. Otherwise, if one simply accepts ex cathedra "FFT is better", > > one might never have a reason to learn how to code a dc. > > > > Its perhaps the equivalent of those torture tests apprentice engineers > > used to be given in "the old days" - take a piece of round rod, and > > square it up just using a file and hand measuring tools. Then file it > > round again, ditto. Not a test I ever took, nor could pass. > > As an amateur silversmith, I make square wire from round with hammer ans > anvil block. One of the most impressive bit of blacksmithing I ever saw > was the transformation of a 1" square, 8" long rod into a 2" cube by > hammering it only on its ends. Much depends on how it is heated in the > forge and with a torch.
I never assigned that sort of "torture" to others, but at one point I did try to implement an Singular Value Decomposition from scratch. While I never quite got there, I did find out how much I am willing to pay for a good numerical library... Rune