I did search the groups for fast convolututions,but did not find the answer relevent to my research.So here is the question. I use a gaussian to smoothen the image.Making the filter seperable (in row and column) is the normal approach(unless you have a lot of time to process a 2D gaussian).I am NOT doing recursive filtering,just filtering it once.Some of the approaches I came across are 1. FFTs/DFTs/overlap add/overlap save methods ->dont think its worth the pain,as the gaussian kernel is small(less than 10 samples).may be more time will be spent in transforming the signals,i dont know,but i did not try it yet.I also remember something taught in school,that it would not be a good idea to transform the signals for small sequences 2. SIMD/MIMD approaches. Sounds reasonable and logical.I use Intel MMX/Windows->VC and want to get some references for SIMD/MIMD if this is a good approach 3.Specialized hardware/FPGAs/DSPs. I cannot afford it. Are there any other approaches that I am missing?If yes,please also point me to some references. My other thought was since the gaussian is symmetrical, say g[-3],g[-2],g[-1],g[0],g[1],g[2],g[3], i.e. g[i] = g[-i] and if the data sequence is ,say, data[0],data[1]....data[250], and say data[i] is the current point under consideration. data[i-1] is multiplied with g[i+1] to give ,say A,, and data[i+1] is multiplied with g[i-1],to give say B. when the current point shifts to data[i+2], data[i+1] is multiplied with g[i-1] which is essentially same as B, and this multiplication need not be performed. So essentially,only (N-1)/2 multiplications are performed for every point which is convolved(because of symmetry of gaussian). (The same holds true for derivative of gaussian too,but with a sign change). Eventhough my idea is amorphous,at this point, I dont want to re-invent the wheel. Does anybody have any references/better method for faster convolution? Thanks for all your help Ram
Fast Convolution
Started by ●July 22, 2004
Reply by ●July 22, 20042004-07-22
ip4ram@yahoo.com writes:> 1. FFTs/DFTs/overlap add/overlap save methods ->dont think its worth > the pain,as the gaussian kernel is small(less than 10 samples).may > be more time will be spent in transforming the signals,i dont > know,but i did not try it yet.I also remember something taught in > school,that it would not be a good idea to transform the signals for > small sequences > > Are there any other approaches that I am missing?If yes,please also > point me to some references.Ok, here is a simple idea for a kernel of about 3*k samples size: Approximate the Gaussian with 3 piecewise quadratic curves. You can do that in the following manner: s0 = s1 = s2 = 0; // actually, this works even with arbitrary initialization! for (i=0; i<n; i++) a[i] = s2 += s1 += s0 += a[i]; for (i=0; i<n-2*k; i++) a[i] = a[i] - 2* a[i+k] + a[i+2*k]; You definitely need to use some integer or other exact data type to do this (or the exploding s0, s1, s2 terms will not cancel out numerically), and the result has to be scaled down again afterwards. This same "sparse" approximation technique can be used for arbitrarily shaped kernels that are approximated with B-splines of some fixed degree. The first for-loop basically convolves the input with the base B-spline, the second one then convolves with a sparse pulse train that generates the kernel shape from the basic B-spline. Actually, this is my own idea: I have not found it in literature. But it is a fairly obvious trick, so it is probably just for lack of looking. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum