Reply by Greg Heath January 30, 20112011-01-30
On Jan 30, 4:46&#4294967295;pm, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
> >On Jan 28, 1:32=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote: > >> >On Jan 28, 1:11=3DA0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> > wrot= > >e: > >> >> >On Jan 28, 12:43=3D3DA0am, "all4dsp" > > <all4dsp@n_o_s_p_a_m.comcast.net= > > > > > > > > >> >> >wrote: > >> >> >> Hi, > > >> >> >> I need to IDFT a large column array where only something like > >> 0.00001% > >> >> of > >> >> >> the FFT coefficients are non-zero. I've tried using the > definition > >> IDF=3D > >> >T > >> >> b=3D3D > >> >> >ut > >> >> >> find that Matlab (or, FFTW) IFFT function is still faster. > > >> >> >> As both of these methods are still too slow, I wonder if anyone > >> could > >> >> >> recommend a pruned IDFT algorithm to speed things up while > >> maintaining > >> >> >> accuracy. My largest array size would be 536M rows x 1 column, > and > >> up > >> >> to > >> >> >> 4000 non-zero FFT coefficients. The data is all real. I'll > implemen= > >t > >> i=3D > >> >n > >> >> >> Matlab as well as C. > > >> >> >is there any rhyme or reason to which 4K bins outa 536M are being > >> >> >used? =3DA0is there any grouping or pattern or are they at "random" > >> >> >frequencies? > > >> >> >r b-j > > >> >> >> Thanks in advance for any comments. > > >> >> There are two applications: > > >> >> The first application has chronological groupings of 4-20 bins, each > >> grou=3D > >> >p > >> >> appearing randomly throughout the spectrum. > > >> >do you mean "contiguous" groupings of 4-20 bins? > > >> >> For example if there are 100M > >> >> FFT coefficients, the first three groups have bin indices 104-108, > >> >> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so > groups) > >> ar=3D > >> >e > >> >> non-zero. > > >> >can two (or more) groupings overlap? =A0if they do, are they > identified > >> >simply as a single grouping? > > >> >> The second application would be one spectrum containing only one > >> group's > >> >> indices. For example, the first spectrum would contain only bins > >> 104-108 > >> >> non-zero. A separate analysis/spectrum would contain only bins > >> 42812-4282=3D > >> >2 > >> >> bins non-zero. etc. For each group there would be a separate > analysis > >> >> containing only that groups non-zero indices. > > >> >i dunno, perhaps having separate, optimized FFTs for each possible > >> >group size (like a little FFT for 4 bins, another for 8 bins, another > >> >for 16 bins, and others for 6 or 12 or 20 bins). =A0you can FFT the > >> >group (as if it started at bin 0), and then adjust the group's phase > >> >(with one big phase offset for each group), depending on where the > >> >group is offset into the whole input spectrum. > > >> >r b-j > > >> >do you mean "contiguous" groupings of 4-20 bins? > > >> YES > > >> >can two (or more) groupings overlap? =A0if they do, are they > identified > >> >simply as a single grouping? > > >> NO > > >> (note, I have the FFT coefficients and I'm trying to get the IDFT > >> computed) > > >> Interestingly, the dominant time consumption in computing the IFFT > using > >> FFTW is creating the large complex array with mostly zeros. It takes > long= > >er > >> to do this than compute the IFFT. Even if I could optimize the IFFT time > = > >to > >> be very short, I'd still have the overhead of creating the input array. > >> Maybe this is a Matlab issue, and it'll speed up once in C (anyone know > > >In MATLAB, even if X is complex, do not initialize with > > >clear all > >tic > >for i =3D 1:100 > > &#4294967295; X =3D zeros(20e6,1); > >end > >toc &#4294967295; &#4294967295;% elapsed_time =3D 25.891 > > >%Instead, just use > > >clear all > >tic > >for i =3D 1:100 > > &#4294967295; X(20e6,1) =3D 0; > >end > >toc &#4294967295; % elapsed_time =3D 0.25 > > >Then fill in the small number of nonzero complex components. > > >Hope this helps. > > >Greg > > Thanks Greg, > > I tried your suggestion but it didn't make much of a difference. From the > best I can tell, regardless of how the mostly-zero array is created, when > this array is sent into ifft (or fft), matlab converts all the zeros to > complex zeros and this is the dominant time consumer. > > It seems one can either create a complex array of mostly zeros and populate > with the few non-zero complex numbers (in which case the major time-hit > occurs up front), or one can create a fast non-complex array and then > populate it with complex numbers (which also if fast), but then the ifft > process converts the zeros to complex zeros (and the time-hit occurs at > this point).- Hide quoted text - > > - Show quoted text -
There is still a significant time difference between just intializing the last element with, approximately, a complex zero and initializing all of the elements with complex zeros. The approximation is needed because if just the last element is initialized as a comlex zero, MATLAB creates a real zero matrix. % Instead of using clc, clear all tic for i = 1:100 X3 = complex(zeros(10e6,1),zeros(10e6,1)); end toc % elapsed_time = 62.375 whos % just use clear all tic for i = 1:100 X4(10e6,1) = 0+j*realmin; % realmin = 2.2251e-308 end toc % elapsed_time = 4.437 whos Hope this helps. Greg
Reply by all4dsp January 30, 20112011-01-30
>On Jan 28, 1:32=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote: >> >On Jan 28, 1:11=3DA0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net>
wrot=
>e: >> >> >On Jan 28, 12:43=3D3DA0am, "all4dsp"
<all4dsp@n_o_s_p_a_m.comcast.net=
>> >> >> >wrote: >> >> >> Hi, >> >> >> >> I need to IDFT a large column array where only something like >> 0.00001% >> >> of >> >> >> the FFT coefficients are non-zero. I've tried using the
definition
>> IDF=3D >> >T >> >> b=3D3D >> >> >ut >> >> >> find that Matlab (or, FFTW) IFFT function is still faster. >> >> >> >> As both of these methods are still too slow, I wonder if anyone >> could >> >> >> recommend a pruned IDFT algorithm to speed things up while >> maintaining >> >> >> accuracy. My largest array size would be 536M rows x 1 column,
and
>> up >> >> to >> >> >> 4000 non-zero FFT coefficients. The data is all real. I'll
implemen=
>t >> i=3D >> >n >> >> >> Matlab as well as C. >> >> >> >is there any rhyme or reason to which 4K bins outa 536M are being >> >> >used? =3DA0is there any grouping or pattern or are they at "random" >> >> >frequencies? >> >> >> >r b-j >> >> >> >> Thanks in advance for any comments. >> >> >> There are two applications: >> >> >> The first application has chronological groupings of 4-20 bins, each >> grou=3D >> >p >> >> appearing randomly throughout the spectrum. >> >> >do you mean "contiguous" groupings of 4-20 bins? >> >> >> For example if there are 100M >> >> FFT coefficients, the first three groups have bin indices 104-108, >> >> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so
groups)
>> ar=3D >> >e >> >> non-zero. >> >> >can two (or more) groupings overlap? =A0if they do, are they
identified
>> >simply as a single grouping? >> >> >> The second application would be one spectrum containing only one >> group's >> >> indices. For example, the first spectrum would contain only bins >> 104-108 >> >> non-zero. A separate analysis/spectrum would contain only bins >> 42812-4282=3D >> >2 >> >> bins non-zero. etc. For each group there would be a separate
analysis
>> >> containing only that groups non-zero indices. >> >> >i dunno, perhaps having separate, optimized FFTs for each possible >> >group size (like a little FFT for 4 bins, another for 8 bins, another >> >for 16 bins, and others for 6 or 12 or 20 bins). =A0you can FFT the >> >group (as if it started at bin 0), and then adjust the group's phase >> >(with one big phase offset for each group), depending on where the >> >group is offset into the whole input spectrum. >> >> >r b-j >> >> >do you mean "contiguous" groupings of 4-20 bins? >> >> YES >> >> >can two (or more) groupings overlap? =A0if they do, are they
identified
>> >simply as a single grouping? >> >> NO >> >> (note, I have the FFT coefficients and I'm trying to get the IDFT >> computed) >> >> Interestingly, the dominant time consumption in computing the IFFT
using
>> FFTW is creating the large complex array with mostly zeros. It takes
long=
>er >> to do this than compute the IFFT. Even if I could optimize the IFFT time
=
>to >> be very short, I'd still have the overhead of creating the input array. >> Maybe this is a Matlab issue, and it'll speed up once in C (anyone know > >In MATLAB, even if X is complex, do not initialize with > >clear all >tic >for i =3D 1:100 > X =3D zeros(20e6,1); >end >toc % elapsed_time =3D 25.891 > >%Instead, just use > >clear all >tic >for i =3D 1:100 > X(20e6,1) =3D 0; >end >toc % elapsed_time =3D 0.25 > >Then fill in the small number of nonzero complex components. > >Hope this helps. > >Greg >
Thanks Greg, I tried your suggestion but it didn't make much of a difference. From the best I can tell, regardless of how the mostly-zero array is created, when this array is sent into ifft (or fft), matlab converts all the zeros to complex zeros and this is the dominant time consumer. It seems one can either create a complex array of mostly zeros and populate with the few non-zero complex numbers (in which case the major time-hit occurs up front), or one can create a fast non-complex array and then populate it with complex numbers (which also if fast), but then the ifft process converts the zeros to complex zeros (and the time-hit occurs at this point).
Reply by Greg Heath January 30, 20112011-01-30
On Jan 28, 1:32&#4294967295;am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
> >On Jan 28, 1:11=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote: > >> >On Jan 28, 12:43=3DA0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> > >> >wrote: > >> >> Hi, > > >> >> I need to IDFT a large column array where only something like > 0.00001% > >> of > >> >> the FFT coefficients are non-zero. I've tried using the definition > IDF= > >T > >> b=3D > >> >ut > >> >> find that Matlab (or, FFTW) IFFT function is still faster. > > >> >> As both of these methods are still too slow, I wonder if anyone > could > >> >> recommend a pruned IDFT algorithm to speed things up while > maintaining > >> >> accuracy. My largest array size would be 536M rows x 1 column, and > up > >> to > >> >> 4000 non-zero FFT coefficients. The data is all real. I'll implement > i= > >n > >> >> Matlab as well as C. > > >> >is there any rhyme or reason to which 4K bins outa 536M are being > >> >used? =A0is there any grouping or pattern or are they at "random" > >> >frequencies? > > >> >r b-j > > >> >> Thanks in advance for any comments. > > >> There are two applications: > > >> The first application has chronological groupings of 4-20 bins, each > grou= > >p > >> appearing randomly throughout the spectrum. > > >do you mean "contiguous" groupings of 4-20 bins? > > >> For example if there are 100M > >> FFT coefficients, the first three groups have bin indices 104-108, > >> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so groups) > ar= > >e > >> non-zero. > > >can two (or more) groupings overlap? &#4294967295;if they do, are they identified > >simply as a single grouping? > > >> The second application would be one spectrum containing only one > group's > >> indices. For example, the first spectrum would contain only bins > 104-108 > >> non-zero. A separate analysis/spectrum would contain only bins > 42812-4282= > >2 > >> bins non-zero. etc. For each group there would be a separate analysis > >> containing only that groups non-zero indices. > > >i dunno, perhaps having separate, optimized FFTs for each possible > >group size (like a little FFT for 4 bins, another for 8 bins, another > >for 16 bins, and others for 6 or 12 or 20 bins). &#4294967295;you can FFT the > >group (as if it started at bin 0), and then adjust the group's phase > >(with one big phase offset for each group), depending on where the > >group is offset into the whole input spectrum. > > >r b-j > > >do you mean "contiguous" groupings of 4-20 bins? > > YES > > >can two (or more) groupings overlap? &#4294967295;if they do, are they identified > >simply as a single grouping? > > NO > > (note, I have the FFT coefficients and I'm trying to get the IDFT > computed) > > Interestingly, the dominant time consumption in computing the IFFT using > FFTW is creating the large complex array with mostly zeros. It takes longer > to do this than compute the IFFT. Even if I could optimize the IFFT time to > be very short, I'd still have the overhead of creating the input array. > Maybe this is a Matlab issue, and it'll speed up once in C (anyone know
In MATLAB, even if X is complex, do not initialize with clear all tic for i = 1:100 X = zeros(20e6,1); end toc % elapsed_time = 25.891 %Instead, just use clear all tic for i = 1:100 X(20e6,1) = 0; end toc % elapsed_time = 0.25 Then fill in the small number of nonzero complex components. Hope this helps. Greg
Reply by Tim Wescott January 28, 20112011-01-28
On 01/28/2011 08:36 AM, all4dsp wrote:
>> On 01/27/2011 10:32 PM, all4dsp wrote: >>>> On Jan 28, 1:11=A0am, "all4dsp"<all4dsp@n_o_s_p_a_m.comcast.net> > wrote: >>>>>> On Jan 28, 12:43=3DA0am, "all4dsp"<all4dsp@n_o_s_p_a_m.comcast.net> >>>>>> wrote: >>>>>>> Hi, >>>>> >>>>>>> I need to IDFT a large column array where only something like >>> 0.00001% >>>>> of >>>>>>> the FFT coefficients are non-zero. I've tried using the definition >>> IDF= >>>> T >>>>> b=3D >>>>>> ut >>>>>>> find that Matlab (or, FFTW) IFFT function is still faster. >>>>> >>>>>>> As both of these methods are still too slow, I wonder if anyone >>> could >>>>>>> recommend a pruned IDFT algorithm to speed things up while >>> maintaining >>>>>>> accuracy. My largest array size would be 536M rows x 1 column, and >>> up >>>>> to >>>>>>> 4000 non-zero FFT coefficients. The data is all real. I'll > implement >>> i= >>>> n >>>>>>> Matlab as well as C. >>>>> >>>>>> is there any rhyme or reason to which 4K bins outa 536M are being >>>>>> used? =A0is there any grouping or pattern or are they at "random" >>>>>> frequencies? >>>>> >>>>>> r b-j >>>>> >>>>>>> Thanks in advance for any comments. >>>>> >>>>> There are two applications: >>>>> >>>>> The first application has chronological groupings of 4-20 bins, each >>> grou= >>>> p >>>>> appearing randomly throughout the spectrum. >>>> >>>> do you mean "contiguous" groupings of 4-20 bins? >>>> >>>>> For example if there are 100M >>>>> FFT coefficients, the first three groups have bin indices 104-108, >>>>> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so > groups) >>> ar= >>>> e >>>>> non-zero. >>>> >>>> can two (or more) groupings overlap? if they do, are they identified >>>> simply as a single grouping? >>>> >>>>> The second application would be one spectrum containing only one >>> group's >>>>> indices. For example, the first spectrum would contain only bins >>> 104-108 >>>>> non-zero. A separate analysis/spectrum would contain only bins >>> 42812-4282= >>>> 2 >>>>> bins non-zero. etc. For each group there would be a separate analysis >>>>> containing only that groups non-zero indices. >>>> >>>> i dunno, perhaps having separate, optimized FFTs for each possible >>>> group size (like a little FFT for 4 bins, another for 8 bins, another >>>> for 16 bins, and others for 6 or 12 or 20 bins). you can FFT the >>>> group (as if it started at bin 0), and then adjust the group's phase >>>> (with one big phase offset for each group), depending on where the >>>> group is offset into the whole input spectrum. >>>> >>>> r b-j >>>> >>> >>> >>> >>> >>>> do you mean "contiguous" groupings of 4-20 bins? >>> >>> YES >>> >>>> can two (or more) groupings overlap? if they do, are they identified >>>> simply as a single grouping? >>> >>> NO >>> >>> >>> (note, I have the FFT coefficients and I'm trying to get the IDFT >>> computed) >>> >>> Interestingly, the dominant time consumption in computing the IFFT > using >>> FFTW is creating the large complex array with mostly zeros. It takes > longer >>> to do this than compute the IFFT. Even if I could optimize the IFFT time > to >>> be very short, I'd still have the overhead of creating the input array. >>> Maybe this is a Matlab issue, and it'll speed up once in C (anyone > know?). >> >> Matlab has a sparse array format -- how quick does that work? It'll >> have to convert for IFFT, but it wouldn't have to for your sped-up > version. >> >> How are you creating the array? In Scilab it'd be >> >> my_array = zeros(bazzilion, 1); >> >> with bazzilion equal to your desired (large) array size. >> >> -- >> >> Tim Wescott >> Wescott Design Services >> http://www.wescottdesign.com >> >> Do you need to implement control loops in software? >> "Applied Control Theory for Embedded Systems" was written for you. >> See details at http://www.wescottdesign.com/actfes/actfes.html > > > I keep it pretty simple. The IDFT definition implementation in Matlab looks > something like: > > for mm = 1 : length(bins) > ttrend = ttrend + 2*real( fftcoef(mm).*( > exp(1j*2*pi*(bins(mm)-1)*(nn-1)/FFTn) )); > end > > I could even vectorize the loop using a matrix: > > ttrend = 2*real( ( exp(1j*2*pi*(nn-1)*(bins-1)/FFTn) )*fftcoef); > > but I found filling the memory with the matrix increases execution time 5x, > so I leave it as a for loop above. Either way there's no need for sparse > arrays/matrices. Keeping the exp function is faster than expanding into sin > and cos, even if you only keep the two real terms after complex > multiplication with fftcoef. > > The FFTW way is: > > fft_dj = zeros(FFTn, 1); % preallocate; array turns complex once populate > it with complex numbers > fft_dj(bins) = fftcoef; % populate first half of spectrum including Nyquist > bin > fft_dj(FFTn-bins+2) = conj(fft_dj(bins)); % populate other half of > spectrum > timetrend = real(ifft(fft_dj)); > > Unfortunately I don't know C so I'm having trouble creating a mex file for > the IDFT definition method. If anyone's willing to convert it I can supply > the full code, which is only 10 lines or so. >
Unless there's someone out there that does MEX files all the time this'll take an inordinate amount of time -- there's probably an hour or less to actually implement the algorithm, but possibly a day in all screwing around getting everything to compile correctly. It might be time to learn C... -- Tim Wescott Wescott Design Services http://www.wescottdesign.com Do you need to implement control loops in software? "Applied Control Theory for Embedded Systems" was written for you. See details at http://www.wescottdesign.com/actfes/actfes.html
Reply by all4dsp January 28, 20112011-01-28
>On 01/27/2011 10:32 PM, all4dsp wrote: >>> On Jan 28, 1:11=A0am, "all4dsp"<all4dsp@n_o_s_p_a_m.comcast.net>
wrote:
>>>>> On Jan 28, 12:43=3DA0am, "all4dsp"<all4dsp@n_o_s_p_a_m.comcast.net> >>>>> wrote: >>>>>> Hi, >>>> >>>>>> I need to IDFT a large column array where only something like >> 0.00001% >>>> of >>>>>> the FFT coefficients are non-zero. I've tried using the definition >> IDF= >>> T >>>> b=3D >>>>> ut >>>>>> find that Matlab (or, FFTW) IFFT function is still faster. >>>> >>>>>> As both of these methods are still too slow, I wonder if anyone >> could >>>>>> recommend a pruned IDFT algorithm to speed things up while >> maintaining >>>>>> accuracy. My largest array size would be 536M rows x 1 column, and >> up >>>> to >>>>>> 4000 non-zero FFT coefficients. The data is all real. I'll
implement
>> i= >>> n >>>>>> Matlab as well as C. >>>> >>>>> is there any rhyme or reason to which 4K bins outa 536M are being >>>>> used? =A0is there any grouping or pattern or are they at "random" >>>>> frequencies? >>>> >>>>> r b-j >>>> >>>>>> Thanks in advance for any comments. >>>> >>>> There are two applications: >>>> >>>> The first application has chronological groupings of 4-20 bins, each >> grou= >>> p >>>> appearing randomly throughout the spectrum. >>> >>> do you mean "contiguous" groupings of 4-20 bins? >>> >>>> For example if there are 100M >>>> FFT coefficients, the first three groups have bin indices 104-108, >>>> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so
groups)
>> ar= >>> e >>>> non-zero. >>> >>> can two (or more) groupings overlap? if they do, are they identified >>> simply as a single grouping? >>> >>>> The second application would be one spectrum containing only one >> group's >>>> indices. For example, the first spectrum would contain only bins >> 104-108 >>>> non-zero. A separate analysis/spectrum would contain only bins >> 42812-4282= >>> 2 >>>> bins non-zero. etc. For each group there would be a separate analysis >>>> containing only that groups non-zero indices. >>> >>> i dunno, perhaps having separate, optimized FFTs for each possible >>> group size (like a little FFT for 4 bins, another for 8 bins, another >>> for 16 bins, and others for 6 or 12 or 20 bins). you can FFT the >>> group (as if it started at bin 0), and then adjust the group's phase >>> (with one big phase offset for each group), depending on where the >>> group is offset into the whole input spectrum. >>> >>> r b-j >>> >> >> >> >> >>> do you mean "contiguous" groupings of 4-20 bins? >> >> YES >> >>> can two (or more) groupings overlap? if they do, are they identified >>> simply as a single grouping? >> >> NO >> >> >> (note, I have the FFT coefficients and I'm trying to get the IDFT >> computed) >> >> Interestingly, the dominant time consumption in computing the IFFT
using
>> FFTW is creating the large complex array with mostly zeros. It takes
longer
>> to do this than compute the IFFT. Even if I could optimize the IFFT time
to
>> be very short, I'd still have the overhead of creating the input array. >> Maybe this is a Matlab issue, and it'll speed up once in C (anyone
know?).
> >Matlab has a sparse array format -- how quick does that work? It'll >have to convert for IFFT, but it wouldn't have to for your sped-up
version.
> >How are you creating the array? In Scilab it'd be > >my_array = zeros(bazzilion, 1); > >with bazzilion equal to your desired (large) array size. > >-- > >Tim Wescott >Wescott Design Services >http://www.wescottdesign.com > >Do you need to implement control loops in software? >"Applied Control Theory for Embedded Systems" was written for you. >See details at http://www.wescottdesign.com/actfes/actfes.html
I keep it pretty simple. The IDFT definition implementation in Matlab looks something like: for mm = 1 : length(bins) ttrend = ttrend + 2*real( fftcoef(mm).*( exp(1j*2*pi*(bins(mm)-1)*(nn-1)/FFTn) )); end I could even vectorize the loop using a matrix: ttrend = 2*real( ( exp(1j*2*pi*(nn-1)*(bins-1)/FFTn) )*fftcoef); but I found filling the memory with the matrix increases execution time 5x, so I leave it as a for loop above. Either way there's no need for sparse arrays/matrices. Keeping the exp function is faster than expanding into sin and cos, even if you only keep the two real terms after complex multiplication with fftcoef. The FFTW way is: fft_dj = zeros(FFTn, 1); % preallocate; array turns complex once populate it with complex numbers fft_dj(bins) = fftcoef; % populate first half of spectrum including Nyquist bin fft_dj(FFTn-bins+2) = conj(fft_dj(bins)); % populate other half of spectrum timetrend = real(ifft(fft_dj)); Unfortunately I don't know C so I'm having trouble creating a mex file for the IDFT definition method. If anyone's willing to convert it I can supply the full code, which is only 10 lines or so.
Reply by Tim Wescott January 28, 20112011-01-28
On 01/27/2011 10:32 PM, all4dsp wrote:
>> On Jan 28, 1:11=A0am, "all4dsp"<all4dsp@n_o_s_p_a_m.comcast.net> wrote: >>>> On Jan 28, 12:43=3DA0am, "all4dsp"<all4dsp@n_o_s_p_a_m.comcast.net> >>>> wrote: >>>>> Hi, >>> >>>>> I need to IDFT a large column array where only something like > 0.00001% >>> of >>>>> the FFT coefficients are non-zero. I've tried using the definition > IDF= >> T >>> b=3D >>>> ut >>>>> find that Matlab (or, FFTW) IFFT function is still faster. >>> >>>>> As both of these methods are still too slow, I wonder if anyone > could >>>>> recommend a pruned IDFT algorithm to speed things up while > maintaining >>>>> accuracy. My largest array size would be 536M rows x 1 column, and > up >>> to >>>>> 4000 non-zero FFT coefficients. The data is all real. I'll implement > i= >> n >>>>> Matlab as well as C. >>> >>>> is there any rhyme or reason to which 4K bins outa 536M are being >>>> used? =A0is there any grouping or pattern or are they at "random" >>>> frequencies? >>> >>>> r b-j >>> >>>>> Thanks in advance for any comments. >>> >>> There are two applications: >>> >>> The first application has chronological groupings of 4-20 bins, each > grou= >> p >>> appearing randomly throughout the spectrum. >> >> do you mean "contiguous" groupings of 4-20 bins? >> >>> For example if there are 100M >>> FFT coefficients, the first three groups have bin indices 104-108, >>> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so groups) > ar= >> e >>> non-zero. >> >> can two (or more) groupings overlap? if they do, are they identified >> simply as a single grouping? >> >>> The second application would be one spectrum containing only one > group's >>> indices. For example, the first spectrum would contain only bins > 104-108 >>> non-zero. A separate analysis/spectrum would contain only bins > 42812-4282= >> 2 >>> bins non-zero. etc. For each group there would be a separate analysis >>> containing only that groups non-zero indices. >> >> i dunno, perhaps having separate, optimized FFTs for each possible >> group size (like a little FFT for 4 bins, another for 8 bins, another >> for 16 bins, and others for 6 or 12 or 20 bins). you can FFT the >> group (as if it started at bin 0), and then adjust the group's phase >> (with one big phase offset for each group), depending on where the >> group is offset into the whole input spectrum. >> >> r b-j >> > > > > >> do you mean "contiguous" groupings of 4-20 bins? > > YES > >> can two (or more) groupings overlap? if they do, are they identified >> simply as a single grouping? > > NO > > > (note, I have the FFT coefficients and I'm trying to get the IDFT > computed) > > Interestingly, the dominant time consumption in computing the IFFT using > FFTW is creating the large complex array with mostly zeros. It takes longer > to do this than compute the IFFT. Even if I could optimize the IFFT time to > be very short, I'd still have the overhead of creating the input array. > Maybe this is a Matlab issue, and it'll speed up once in C (anyone know?).
Matlab has a sparse array format -- how quick does that work? It'll have to convert for IFFT, but it wouldn't have to for your sped-up version. How are you creating the array? In Scilab it'd be my_array = zeros(bazzilion, 1); with bazzilion equal to your desired (large) array size. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com Do you need to implement control loops in software? "Applied Control Theory for Embedded Systems" was written for you. See details at http://www.wescottdesign.com/actfes/actfes.html
Reply by Tim Wescott January 28, 20112011-01-28
On 01/27/2011 09:43 PM, all4dsp wrote:
> Hi, > > I need to IDFT a large column array where only something like 0.00001% of > the FFT coefficients are non-zero. I've tried using the definition IDFT but > find that Matlab (or, FFTW) IFFT function is still faster. > > As both of these methods are still too slow, I wonder if anyone could > recommend a pruned IDFT algorithm to speed things up while maintaining > accuracy. My largest array size would be 536M rows x 1 column, and up to > 4000 non-zero FFT coefficients. The data is all real. I'll implement in > Matlab as well as C. > > Thanks in advance for any comments.
Are you trying to do your IDFT in Matlab code vs. Matlab's built-in IFFT? If so, you should see significant speed up using C. I'm not sure if you could significantly speed this up with some sort of bastardized IFFT routine that kept track of the butterflies with non-zero entries -- at some point the effort of book-keeping would overwhelm the savings of not having to go through everything. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com Do you need to implement control loops in software? "Applied Control Theory for Embedded Systems" was written for you. See details at http://www.wescottdesign.com/actfes/actfes.html
Reply by all4dsp January 28, 20112011-01-28
>On Jan 28, 1:11=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote: >> >On Jan 28, 12:43=3DA0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> >> >wrote: >> >> Hi, >> >> >> I need to IDFT a large column array where only something like
0.00001%
>> of >> >> the FFT coefficients are non-zero. I've tried using the definition
IDF=
>T >> b=3D >> >ut >> >> find that Matlab (or, FFTW) IFFT function is still faster. >> >> >> As both of these methods are still too slow, I wonder if anyone
could
>> >> recommend a pruned IDFT algorithm to speed things up while
maintaining
>> >> accuracy. My largest array size would be 536M rows x 1 column, and
up
>> to >> >> 4000 non-zero FFT coefficients. The data is all real. I'll implement
i=
>n >> >> Matlab as well as C. >> >> >is there any rhyme or reason to which 4K bins outa 536M are being >> >used? =A0is there any grouping or pattern or are they at "random" >> >frequencies? >> >> >r b-j >> >> >> Thanks in advance for any comments. >> >> There are two applications: >> >> The first application has chronological groupings of 4-20 bins, each
grou=
>p >> appearing randomly throughout the spectrum. > >do you mean "contiguous" groupings of 4-20 bins? > >> For example if there are 100M >> FFT coefficients, the first three groups have bin indices 104-108, >> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so groups)
ar=
>e >> non-zero. > >can two (or more) groupings overlap? if they do, are they identified >simply as a single grouping? > >> The second application would be one spectrum containing only one
group's
>> indices. For example, the first spectrum would contain only bins
104-108
>> non-zero. A separate analysis/spectrum would contain only bins
42812-4282=
>2 >> bins non-zero. etc. For each group there would be a separate analysis >> containing only that groups non-zero indices. > >i dunno, perhaps having separate, optimized FFTs for each possible >group size (like a little FFT for 4 bins, another for 8 bins, another >for 16 bins, and others for 6 or 12 or 20 bins). you can FFT the >group (as if it started at bin 0), and then adjust the group's phase >(with one big phase offset for each group), depending on where the >group is offset into the whole input spectrum. > >r b-j >
>do you mean "contiguous" groupings of 4-20 bins?
YES
>can two (or more) groupings overlap? if they do, are they identified >simply as a single grouping?
NO (note, I have the FFT coefficients and I'm trying to get the IDFT computed) Interestingly, the dominant time consumption in computing the IFFT using FFTW is creating the large complex array with mostly zeros. It takes longer to do this than compute the IFFT. Even if I could optimize the IFFT time to be very short, I'd still have the overhead of creating the input array. Maybe this is a Matlab issue, and it'll speed up once in C (anyone know?).
Reply by robert bristow-johnson January 28, 20112011-01-28
On Jan 28, 1:11&#4294967295;am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
> >On Jan 28, 12:43=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> > >wrote: > >> Hi, > > >> I need to IDFT a large column array where only something like 0.00001% > of > >> the FFT coefficients are non-zero. I've tried using the definition IDFT > b= > >ut > >> find that Matlab (or, FFTW) IFFT function is still faster. > > >> As both of these methods are still too slow, I wonder if anyone could > >> recommend a pruned IDFT algorithm to speed things up while maintaining > >> accuracy. My largest array size would be 536M rows x 1 column, and up > to > >> 4000 non-zero FFT coefficients. The data is all real. I'll implement in > >> Matlab as well as C. > > >is there any rhyme or reason to which 4K bins outa 536M are being > >used? &#4294967295;is there any grouping or pattern or are they at "random" > >frequencies? > > >r b-j > > >> Thanks in advance for any comments. > > There are two applications: > > The first application has chronological groupings of 4-20 bins, each group > appearing randomly throughout the spectrum.
do you mean "contiguous" groupings of 4-20 bins?
> For example if there are 100M > FFT coefficients, the first three groups have bin indices 104-108, > 42812-42822, 89352-89359 (etc., there may be 200 hundred or so groups) are > non-zero.
can two (or more) groupings overlap? if they do, are they identified simply as a single grouping?
> The second application would be one spectrum containing only one group's > indices. For example, the first spectrum would contain only bins 104-108 > non-zero. A separate analysis/spectrum would contain only bins 42812-42822 > bins non-zero. etc. For each group there would be a separate analysis > containing only that groups non-zero indices.
i dunno, perhaps having separate, optimized FFTs for each possible group size (like a little FFT for 4 bins, another for 8 bins, another for 16 bins, and others for 6 or 12 or 20 bins). you can FFT the group (as if it started at bin 0), and then adjust the group's phase (with one big phase offset for each group), depending on where the group is offset into the whole input spectrum. r b-j
Reply by all4dsp January 28, 20112011-01-28
>On Jan 28, 12:43=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> >wrote: >> Hi, >> >> I need to IDFT a large column array where only something like 0.00001%
of
>> the FFT coefficients are non-zero. I've tried using the definition IDFT
b=
>ut >> find that Matlab (or, FFTW) IFFT function is still faster. >> >> As both of these methods are still too slow, I wonder if anyone could >> recommend a pruned IDFT algorithm to speed things up while maintaining >> accuracy. My largest array size would be 536M rows x 1 column, and up
to
>> 4000 non-zero FFT coefficients. The data is all real. I'll implement in >> Matlab as well as C. > >is there any rhyme or reason to which 4K bins outa 536M are being >used? is there any grouping or pattern or are they at "random" >frequencies? > >r b-j > > >> Thanks in advance for any comments. > >
There are two applications: The first application has chronological groupings of 4-20 bins, each group appearing randomly throughout the spectrum. For example if there are 100M FFT coefficients, the first three groups have bin indices 104-108, 42812-42822, 89352-89359 (etc., there may be 200 hundred or so groups) are non-zero. The second application would be one spectrum containing only one group's indices. For example, the first spectrum would contain only bins 104-108 non-zero. A separate analysis/spectrum would contain only bins 42812-42822 bins non-zero. etc. For each group there would be a separate analysis containing only that groups non-zero indices.