On Jan 30, 4:46�pm, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
> >On Jan 28, 1:32=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
> >> >On Jan 28, 1:11=3DA0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net>
> wrot=
> >e:
> >> >> >On Jan 28, 12:43=3D3DA0am, "all4dsp"
>
> <all4dsp@n_o_s_p_a_m.comcast.net=
>
>
>
>
>
>
>
> >> >> >wrote:
> >> >> >> Hi,
>
> >> >> >> I need to IDFT a large column array where only something like
> >> 0.00001%
> >> >> of
> >> >> >> the FFT coefficients are non-zero. I've tried using the
> definition
> >> IDF=3D
> >> >T
> >> >> b=3D3D
> >> >> >ut
> >> >> >> find that Matlab (or, FFTW) IFFT function is still faster.
>
> >> >> >> As both of these methods are still too slow, I wonder if anyone
> >> could
> >> >> >> recommend a pruned IDFT algorithm to speed things up while
> >> maintaining
> >> >> >> accuracy. My largest array size would be 536M rows x 1 column,
> and
> >> up
> >> >> to
> >> >> >> 4000 non-zero FFT coefficients. The data is all real. I'll
> implemen=
> >t
> >> i=3D
> >> >n
> >> >> >> Matlab as well as C.
>
> >> >> >is there any rhyme or reason to which 4K bins outa 536M are being
> >> >> >used? =3DA0is there any grouping or pattern or are they at "random"
> >> >> >frequencies?
>
> >> >> >r b-j
>
> >> >> >> Thanks in advance for any comments.
>
> >> >> There are two applications:
>
> >> >> The first application has chronological groupings of 4-20 bins, each
> >> grou=3D
> >> >p
> >> >> appearing randomly throughout the spectrum.
>
> >> >do you mean "contiguous" groupings of 4-20 bins?
>
> >> >> For example if there are 100M
> >> >> FFT coefficients, the first three groups have bin indices 104-108,
> >> >> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so
> groups)
> >> ar=3D
> >> >e
> >> >> non-zero.
>
> >> >can two (or more) groupings overlap? =A0if they do, are they
> identified
> >> >simply as a single grouping?
>
> >> >> The second application would be one spectrum containing only one
> >> group's
> >> >> indices. For example, the first spectrum would contain only bins
> >> 104-108
> >> >> non-zero. A separate analysis/spectrum would contain only bins
> >> 42812-4282=3D
> >> >2
> >> >> bins non-zero. etc. For each group there would be a separate
> analysis
> >> >> containing only that groups non-zero indices.
>
> >> >i dunno, perhaps having separate, optimized FFTs for each possible
> >> >group size (like a little FFT for 4 bins, another for 8 bins, another
> >> >for 16 bins, and others for 6 or 12 or 20 bins). =A0you can FFT the
> >> >group (as if it started at bin 0), and then adjust the group's phase
> >> >(with one big phase offset for each group), depending on where the
> >> >group is offset into the whole input spectrum.
>
> >> >r b-j
>
> >> >do you mean "contiguous" groupings of 4-20 bins?
>
> >> YES
>
> >> >can two (or more) groupings overlap? =A0if they do, are they
> identified
> >> >simply as a single grouping?
>
> >> NO
>
> >> (note, I have the FFT coefficients and I'm trying to get the IDFT
> >> computed)
>
> >> Interestingly, the dominant time consumption in computing the IFFT
> using
> >> FFTW is creating the large complex array with mostly zeros. It takes
> long=
> >er
> >> to do this than compute the IFFT. Even if I could optimize the IFFT time
> =
> >to
> >> be very short, I'd still have the overhead of creating the input array.
> >> Maybe this is a Matlab issue, and it'll speed up once in C (anyone know
>
> >In MATLAB, even if X is complex, do not initialize with
>
> >clear all
> >tic
> >for i =3D 1:100
> > � X =3D zeros(20e6,1);
> >end
> >toc � �% elapsed_time =3D 25.891
>
> >%Instead, just use
>
> >clear all
> >tic
> >for i =3D 1:100
> > � X(20e6,1) =3D 0;
> >end
> >toc � % elapsed_time =3D 0.25
>
> >Then fill in the small number of nonzero complex components.
>
> >Hope this helps.
>
> >Greg
>
> Thanks Greg,
>
> I tried your suggestion but it didn't make much of a difference. From the
> best I can tell, regardless of how the mostly-zero array is created, when
> this array is sent into ifft (or fft), matlab converts all the zeros to
> complex zeros and this is the dominant time consumer.
>
> It seems one can either create a complex array of mostly zeros and populate
> with the few non-zero complex numbers (in which case the major time-hit
> occurs up front), or one can create a fast non-complex array and then
> populate it with complex numbers (which also if fast), but then the ifft
> process converts the zeros to complex zeros (and the time-hit occurs at
> this point).- Hide quoted text -
>
> - Show quoted text -
There is still a significant time difference
between just intializing the last element
with, approximately, a complex zero and
initializing all of the elements with complex
zeros.
The approximation is needed because if just
the last element is initialized as a comlex
zero, MATLAB creates a real zero matrix.
% Instead of using
clc, clear all
tic
for i = 1:100
X3 = complex(zeros(10e6,1),zeros(10e6,1));
end
toc % elapsed_time = 62.375
whos
% just use
clear all
tic
for i = 1:100
X4(10e6,1) = 0+j*realmin;
% realmin = 2.2251e-308
end
toc % elapsed_time = 4.437
whos
Hope this helps.
Greg
Reply by all4dsp●January 30, 20112011-01-30
>On Jan 28, 1:32=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
>> >On Jan 28, 1:11=3DA0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net>
wrot=
>e:
>> >> >On Jan 28, 12:43=3D3DA0am, "all4dsp"
<all4dsp@n_o_s_p_a_m.comcast.net=
>>
>> >> >wrote:
>> >> >> Hi,
>>
>> >> >> I need to IDFT a large column array where only something like
>> 0.00001%
>> >> of
>> >> >> the FFT coefficients are non-zero. I've tried using the
definition
>> IDF=3D
>> >T
>> >> b=3D3D
>> >> >ut
>> >> >> find that Matlab (or, FFTW) IFFT function is still faster.
>>
>> >> >> As both of these methods are still too slow, I wonder if anyone
>> could
>> >> >> recommend a pruned IDFT algorithm to speed things up while
>> maintaining
>> >> >> accuracy. My largest array size would be 536M rows x 1 column,
and
>> up
>> >> to
>> >> >> 4000 non-zero FFT coefficients. The data is all real. I'll
implemen=
>t
>> i=3D
>> >n
>> >> >> Matlab as well as C.
>>
>> >> >is there any rhyme or reason to which 4K bins outa 536M are being
>> >> >used? =3DA0is there any grouping or pattern or are they at "random"
>> >> >frequencies?
>>
>> >> >r b-j
>>
>> >> >> Thanks in advance for any comments.
>>
>> >> There are two applications:
>>
>> >> The first application has chronological groupings of 4-20 bins, each
>> grou=3D
>> >p
>> >> appearing randomly throughout the spectrum.
>>
>> >do you mean "contiguous" groupings of 4-20 bins?
>>
>> >> For example if there are 100M
>> >> FFT coefficients, the first three groups have bin indices 104-108,
>> >> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so
groups)
>> ar=3D
>> >e
>> >> non-zero.
>>
>> >can two (or more) groupings overlap? =A0if they do, are they
identified
>> >simply as a single grouping?
>>
>> >> The second application would be one spectrum containing only one
>> group's
>> >> indices. For example, the first spectrum would contain only bins
>> 104-108
>> >> non-zero. A separate analysis/spectrum would contain only bins
>> 42812-4282=3D
>> >2
>> >> bins non-zero. etc. For each group there would be a separate
analysis
>> >> containing only that groups non-zero indices.
>>
>> >i dunno, perhaps having separate, optimized FFTs for each possible
>> >group size (like a little FFT for 4 bins, another for 8 bins, another
>> >for 16 bins, and others for 6 or 12 or 20 bins). =A0you can FFT the
>> >group (as if it started at bin 0), and then adjust the group's phase
>> >(with one big phase offset for each group), depending on where the
>> >group is offset into the whole input spectrum.
>>
>> >r b-j
>>
>> >do you mean "contiguous" groupings of 4-20 bins?
>>
>> YES
>>
>> >can two (or more) groupings overlap? =A0if they do, are they
identified
>> >simply as a single grouping?
>>
>> NO
>>
>> (note, I have the FFT coefficients and I'm trying to get the IDFT
>> computed)
>>
>> Interestingly, the dominant time consumption in computing the IFFT
using
>> FFTW is creating the large complex array with mostly zeros. It takes
long=
>er
>> to do this than compute the IFFT. Even if I could optimize the IFFT time
=
>to
>> be very short, I'd still have the overhead of creating the input array.
>> Maybe this is a Matlab issue, and it'll speed up once in C (anyone know
>
>In MATLAB, even if X is complex, do not initialize with
>
>clear all
>tic
>for i =3D 1:100
> X =3D zeros(20e6,1);
>end
>toc % elapsed_time =3D 25.891
>
>%Instead, just use
>
>clear all
>tic
>for i =3D 1:100
> X(20e6,1) =3D 0;
>end
>toc % elapsed_time =3D 0.25
>
>Then fill in the small number of nonzero complex components.
>
>Hope this helps.
>
>Greg
>
Thanks Greg,
I tried your suggestion but it didn't make much of a difference. From the
best I can tell, regardless of how the mostly-zero array is created, when
this array is sent into ifft (or fft), matlab converts all the zeros to
complex zeros and this is the dominant time consumer.
It seems one can either create a complex array of mostly zeros and populate
with the few non-zero complex numbers (in which case the major time-hit
occurs up front), or one can create a fast non-complex array and then
populate it with complex numbers (which also if fast), but then the ifft
process converts the zeros to complex zeros (and the time-hit occurs at
this point).
Reply by Greg Heath●January 30, 20112011-01-30
On Jan 28, 1:32�am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
> >On Jan 28, 1:11=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
> >> >On Jan 28, 12:43=3DA0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net>
> >> >wrote:
> >> >> Hi,
>
> >> >> I need to IDFT a large column array where only something like
> 0.00001%
> >> of
> >> >> the FFT coefficients are non-zero. I've tried using the definition
> IDF=
> >T
> >> b=3D
> >> >ut
> >> >> find that Matlab (or, FFTW) IFFT function is still faster.
>
> >> >> As both of these methods are still too slow, I wonder if anyone
> could
> >> >> recommend a pruned IDFT algorithm to speed things up while
> maintaining
> >> >> accuracy. My largest array size would be 536M rows x 1 column, and
> up
> >> to
> >> >> 4000 non-zero FFT coefficients. The data is all real. I'll implement
> i=
> >n
> >> >> Matlab as well as C.
>
> >> >is there any rhyme or reason to which 4K bins outa 536M are being
> >> >used? =A0is there any grouping or pattern or are they at "random"
> >> >frequencies?
>
> >> >r b-j
>
> >> >> Thanks in advance for any comments.
>
> >> There are two applications:
>
> >> The first application has chronological groupings of 4-20 bins, each
> grou=
> >p
> >> appearing randomly throughout the spectrum.
>
> >do you mean "contiguous" groupings of 4-20 bins?
>
> >> For example if there are 100M
> >> FFT coefficients, the first three groups have bin indices 104-108,
> >> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so groups)
> ar=
> >e
> >> non-zero.
>
> >can two (or more) groupings overlap? �if they do, are they identified
> >simply as a single grouping?
>
> >> The second application would be one spectrum containing only one
> group's
> >> indices. For example, the first spectrum would contain only bins
> 104-108
> >> non-zero. A separate analysis/spectrum would contain only bins
> 42812-4282=
> >2
> >> bins non-zero. etc. For each group there would be a separate analysis
> >> containing only that groups non-zero indices.
>
> >i dunno, perhaps having separate, optimized FFTs for each possible
> >group size (like a little FFT for 4 bins, another for 8 bins, another
> >for 16 bins, and others for 6 or 12 or 20 bins). �you can FFT the
> >group (as if it started at bin 0), and then adjust the group's phase
> >(with one big phase offset for each group), depending on where the
> >group is offset into the whole input spectrum.
>
> >r b-j
>
> >do you mean "contiguous" groupings of 4-20 bins?
>
> YES
>
> >can two (or more) groupings overlap? �if they do, are they identified
> >simply as a single grouping?
>
> NO
>
> (note, I have the FFT coefficients and I'm trying to get the IDFT
> computed)
>
> Interestingly, the dominant time consumption in computing the IFFT using
> FFTW is creating the large complex array with mostly zeros. It takes longer
> to do this than compute the IFFT. Even if I could optimize the IFFT time to
> be very short, I'd still have the overhead of creating the input array.
> Maybe this is a Matlab issue, and it'll speed up once in C (anyone know
In MATLAB, even if X is complex, do not initialize with
clear all
tic
for i = 1:100
X = zeros(20e6,1);
end
toc % elapsed_time = 25.891
%Instead, just use
clear all
tic
for i = 1:100
X(20e6,1) = 0;
end
toc % elapsed_time = 0.25
Then fill in the small number of nonzero complex components.
Hope this helps.
Greg
Reply by Tim Wescott●January 28, 20112011-01-28
On 01/28/2011 08:36 AM, all4dsp wrote:
>> On 01/27/2011 10:32 PM, all4dsp wrote:
>>>> On Jan 28, 1:11=A0am, "all4dsp"<all4dsp@n_o_s_p_a_m.comcast.net>
> wrote:
>>>>>> On Jan 28, 12:43=3DA0am, "all4dsp"<all4dsp@n_o_s_p_a_m.comcast.net>
>>>>>> wrote:
>>>>>>> Hi,
>>>>>
>>>>>>> I need to IDFT a large column array where only something like
>>> 0.00001%
>>>>> of
>>>>>>> the FFT coefficients are non-zero. I've tried using the definition
>>> IDF=
>>>> T
>>>>> b=3D
>>>>>> ut
>>>>>>> find that Matlab (or, FFTW) IFFT function is still faster.
>>>>>
>>>>>>> As both of these methods are still too slow, I wonder if anyone
>>> could
>>>>>>> recommend a pruned IDFT algorithm to speed things up while
>>> maintaining
>>>>>>> accuracy. My largest array size would be 536M rows x 1 column, and
>>> up
>>>>> to
>>>>>>> 4000 non-zero FFT coefficients. The data is all real. I'll
> implement
>>> i=
>>>> n
>>>>>>> Matlab as well as C.
>>>>>
>>>>>> is there any rhyme or reason to which 4K bins outa 536M are being
>>>>>> used? =A0is there any grouping or pattern or are they at "random"
>>>>>> frequencies?
>>>>>
>>>>>> r b-j
>>>>>
>>>>>>> Thanks in advance for any comments.
>>>>>
>>>>> There are two applications:
>>>>>
>>>>> The first application has chronological groupings of 4-20 bins, each
>>> grou=
>>>> p
>>>>> appearing randomly throughout the spectrum.
>>>>
>>>> do you mean "contiguous" groupings of 4-20 bins?
>>>>
>>>>> For example if there are 100M
>>>>> FFT coefficients, the first three groups have bin indices 104-108,
>>>>> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so
> groups)
>>> ar=
>>>> e
>>>>> non-zero.
>>>>
>>>> can two (or more) groupings overlap? if they do, are they identified
>>>> simply as a single grouping?
>>>>
>>>>> The second application would be one spectrum containing only one
>>> group's
>>>>> indices. For example, the first spectrum would contain only bins
>>> 104-108
>>>>> non-zero. A separate analysis/spectrum would contain only bins
>>> 42812-4282=
>>>> 2
>>>>> bins non-zero. etc. For each group there would be a separate analysis
>>>>> containing only that groups non-zero indices.
>>>>
>>>> i dunno, perhaps having separate, optimized FFTs for each possible
>>>> group size (like a little FFT for 4 bins, another for 8 bins, another
>>>> for 16 bins, and others for 6 or 12 or 20 bins). you can FFT the
>>>> group (as if it started at bin 0), and then adjust the group's phase
>>>> (with one big phase offset for each group), depending on where the
>>>> group is offset into the whole input spectrum.
>>>>
>>>> r b-j
>>>>
>>>
>>>
>>>
>>>
>>>> do you mean "contiguous" groupings of 4-20 bins?
>>>
>>> YES
>>>
>>>> can two (or more) groupings overlap? if they do, are they identified
>>>> simply as a single grouping?
>>>
>>> NO
>>>
>>>
>>> (note, I have the FFT coefficients and I'm trying to get the IDFT
>>> computed)
>>>
>>> Interestingly, the dominant time consumption in computing the IFFT
> using
>>> FFTW is creating the large complex array with mostly zeros. It takes
> longer
>>> to do this than compute the IFFT. Even if I could optimize the IFFT time
> to
>>> be very short, I'd still have the overhead of creating the input array.
>>> Maybe this is a Matlab issue, and it'll speed up once in C (anyone
> know?).
>>
>> Matlab has a sparse array format -- how quick does that work? It'll
>> have to convert for IFFT, but it wouldn't have to for your sped-up
> version.
>>
>> How are you creating the array? In Scilab it'd be
>>
>> my_array = zeros(bazzilion, 1);
>>
>> with bazzilion equal to your desired (large) array size.
>>
>> --
>>
>> Tim Wescott
>> Wescott Design Services
>> http://www.wescottdesign.com
>>
>> Do you need to implement control loops in software?
>> "Applied Control Theory for Embedded Systems" was written for you.
>> See details at http://www.wescottdesign.com/actfes/actfes.html
>
>
> I keep it pretty simple. The IDFT definition implementation in Matlab looks
> something like:
>
> for mm = 1 : length(bins)
> ttrend = ttrend + 2*real( fftcoef(mm).*(
> exp(1j*2*pi*(bins(mm)-1)*(nn-1)/FFTn) ));
> end
>
> I could even vectorize the loop using a matrix:
>
> ttrend = 2*real( ( exp(1j*2*pi*(nn-1)*(bins-1)/FFTn) )*fftcoef);
>
> but I found filling the memory with the matrix increases execution time 5x,
> so I leave it as a for loop above. Either way there's no need for sparse
> arrays/matrices. Keeping the exp function is faster than expanding into sin
> and cos, even if you only keep the two real terms after complex
> multiplication with fftcoef.
>
> The FFTW way is:
>
> fft_dj = zeros(FFTn, 1); % preallocate; array turns complex once populate
> it with complex numbers
> fft_dj(bins) = fftcoef; % populate first half of spectrum including Nyquist
> bin
> fft_dj(FFTn-bins+2) = conj(fft_dj(bins)); % populate other half of
> spectrum
> timetrend = real(ifft(fft_dj));
>
> Unfortunately I don't know C so I'm having trouble creating a mex file for
> the IDFT definition method. If anyone's willing to convert it I can supply
> the full code, which is only 10 lines or so.
>
Unless there's someone out there that does MEX files all the time
this'll take an inordinate amount of time -- there's probably an hour or
less to actually implement the algorithm, but possibly a day in all
screwing around getting everything to compile correctly.
It might be time to learn C...
--
Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html
Reply by all4dsp●January 28, 20112011-01-28
>On 01/27/2011 10:32 PM, all4dsp wrote:
>>> On Jan 28, 1:11=A0am, "all4dsp"<all4dsp@n_o_s_p_a_m.comcast.net>
wrote:
>>>>> On Jan 28, 12:43=3DA0am, "all4dsp"<all4dsp@n_o_s_p_a_m.comcast.net>
>>>>> wrote:
>>>>>> Hi,
>>>>
>>>>>> I need to IDFT a large column array where only something like
>> 0.00001%
>>>> of
>>>>>> the FFT coefficients are non-zero. I've tried using the definition
>> IDF=
>>> T
>>>> b=3D
>>>>> ut
>>>>>> find that Matlab (or, FFTW) IFFT function is still faster.
>>>>
>>>>>> As both of these methods are still too slow, I wonder if anyone
>> could
>>>>>> recommend a pruned IDFT algorithm to speed things up while
>> maintaining
>>>>>> accuracy. My largest array size would be 536M rows x 1 column, and
>> up
>>>> to
>>>>>> 4000 non-zero FFT coefficients. The data is all real. I'll
implement
>> i=
>>> n
>>>>>> Matlab as well as C.
>>>>
>>>>> is there any rhyme or reason to which 4K bins outa 536M are being
>>>>> used? =A0is there any grouping or pattern or are they at "random"
>>>>> frequencies?
>>>>
>>>>> r b-j
>>>>
>>>>>> Thanks in advance for any comments.
>>>>
>>>> There are two applications:
>>>>
>>>> The first application has chronological groupings of 4-20 bins, each
>> grou=
>>> p
>>>> appearing randomly throughout the spectrum.
>>>
>>> do you mean "contiguous" groupings of 4-20 bins?
>>>
>>>> For example if there are 100M
>>>> FFT coefficients, the first three groups have bin indices 104-108,
>>>> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so
groups)
>> ar=
>>> e
>>>> non-zero.
>>>
>>> can two (or more) groupings overlap? if they do, are they identified
>>> simply as a single grouping?
>>>
>>>> The second application would be one spectrum containing only one
>> group's
>>>> indices. For example, the first spectrum would contain only bins
>> 104-108
>>>> non-zero. A separate analysis/spectrum would contain only bins
>> 42812-4282=
>>> 2
>>>> bins non-zero. etc. For each group there would be a separate analysis
>>>> containing only that groups non-zero indices.
>>>
>>> i dunno, perhaps having separate, optimized FFTs for each possible
>>> group size (like a little FFT for 4 bins, another for 8 bins, another
>>> for 16 bins, and others for 6 or 12 or 20 bins). you can FFT the
>>> group (as if it started at bin 0), and then adjust the group's phase
>>> (with one big phase offset for each group), depending on where the
>>> group is offset into the whole input spectrum.
>>>
>>> r b-j
>>>
>>
>>
>>
>>
>>> do you mean "contiguous" groupings of 4-20 bins?
>>
>> YES
>>
>>> can two (or more) groupings overlap? if they do, are they identified
>>> simply as a single grouping?
>>
>> NO
>>
>>
>> (note, I have the FFT coefficients and I'm trying to get the IDFT
>> computed)
>>
>> Interestingly, the dominant time consumption in computing the IFFT
using
>> FFTW is creating the large complex array with mostly zeros. It takes
longer
>> to do this than compute the IFFT. Even if I could optimize the IFFT time
to
>> be very short, I'd still have the overhead of creating the input array.
>> Maybe this is a Matlab issue, and it'll speed up once in C (anyone
know?).
>
>Matlab has a sparse array format -- how quick does that work? It'll
>have to convert for IFFT, but it wouldn't have to for your sped-up
version.
>
>How are you creating the array? In Scilab it'd be
>
>my_array = zeros(bazzilion, 1);
>
>with bazzilion equal to your desired (large) array size.
>
>--
>
>Tim Wescott
>Wescott Design Services
>http://www.wescottdesign.com
>
>Do you need to implement control loops in software?
>"Applied Control Theory for Embedded Systems" was written for you.
>See details at http://www.wescottdesign.com/actfes/actfes.html
I keep it pretty simple. The IDFT definition implementation in Matlab looks
something like:
for mm = 1 : length(bins)
ttrend = ttrend + 2*real( fftcoef(mm).*(
exp(1j*2*pi*(bins(mm)-1)*(nn-1)/FFTn) ));
end
I could even vectorize the loop using a matrix:
ttrend = 2*real( ( exp(1j*2*pi*(nn-1)*(bins-1)/FFTn) )*fftcoef);
but I found filling the memory with the matrix increases execution time 5x,
so I leave it as a for loop above. Either way there's no need for sparse
arrays/matrices. Keeping the exp function is faster than expanding into sin
and cos, even if you only keep the two real terms after complex
multiplication with fftcoef.
The FFTW way is:
fft_dj = zeros(FFTn, 1); % preallocate; array turns complex once populate
it with complex numbers
fft_dj(bins) = fftcoef; % populate first half of spectrum including Nyquist
bin
fft_dj(FFTn-bins+2) = conj(fft_dj(bins)); % populate other half of
spectrum
timetrend = real(ifft(fft_dj));
Unfortunately I don't know C so I'm having trouble creating a mex file for
the IDFT definition method. If anyone's willing to convert it I can supply
the full code, which is only 10 lines or so.
Reply by Tim Wescott●January 28, 20112011-01-28
On 01/27/2011 10:32 PM, all4dsp wrote:
>> On Jan 28, 1:11=A0am, "all4dsp"<all4dsp@n_o_s_p_a_m.comcast.net> wrote:
>>>> On Jan 28, 12:43=3DA0am, "all4dsp"<all4dsp@n_o_s_p_a_m.comcast.net>
>>>> wrote:
>>>>> Hi,
>>>
>>>>> I need to IDFT a large column array where only something like
> 0.00001%
>>> of
>>>>> the FFT coefficients are non-zero. I've tried using the definition
> IDF=
>> T
>>> b=3D
>>>> ut
>>>>> find that Matlab (or, FFTW) IFFT function is still faster.
>>>
>>>>> As both of these methods are still too slow, I wonder if anyone
> could
>>>>> recommend a pruned IDFT algorithm to speed things up while
> maintaining
>>>>> accuracy. My largest array size would be 536M rows x 1 column, and
> up
>>> to
>>>>> 4000 non-zero FFT coefficients. The data is all real. I'll implement
> i=
>> n
>>>>> Matlab as well as C.
>>>
>>>> is there any rhyme or reason to which 4K bins outa 536M are being
>>>> used? =A0is there any grouping or pattern or are they at "random"
>>>> frequencies?
>>>
>>>> r b-j
>>>
>>>>> Thanks in advance for any comments.
>>>
>>> There are two applications:
>>>
>>> The first application has chronological groupings of 4-20 bins, each
> grou=
>> p
>>> appearing randomly throughout the spectrum.
>>
>> do you mean "contiguous" groupings of 4-20 bins?
>>
>>> For example if there are 100M
>>> FFT coefficients, the first three groups have bin indices 104-108,
>>> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so groups)
> ar=
>> e
>>> non-zero.
>>
>> can two (or more) groupings overlap? if they do, are they identified
>> simply as a single grouping?
>>
>>> The second application would be one spectrum containing only one
> group's
>>> indices. For example, the first spectrum would contain only bins
> 104-108
>>> non-zero. A separate analysis/spectrum would contain only bins
> 42812-4282=
>> 2
>>> bins non-zero. etc. For each group there would be a separate analysis
>>> containing only that groups non-zero indices.
>>
>> i dunno, perhaps having separate, optimized FFTs for each possible
>> group size (like a little FFT for 4 bins, another for 8 bins, another
>> for 16 bins, and others for 6 or 12 or 20 bins). you can FFT the
>> group (as if it started at bin 0), and then adjust the group's phase
>> (with one big phase offset for each group), depending on where the
>> group is offset into the whole input spectrum.
>>
>> r b-j
>>
>
>
>
>
>> do you mean "contiguous" groupings of 4-20 bins?
>
> YES
>
>> can two (or more) groupings overlap? if they do, are they identified
>> simply as a single grouping?
>
> NO
>
>
> (note, I have the FFT coefficients and I'm trying to get the IDFT
> computed)
>
> Interestingly, the dominant time consumption in computing the IFFT using
> FFTW is creating the large complex array with mostly zeros. It takes longer
> to do this than compute the IFFT. Even if I could optimize the IFFT time to
> be very short, I'd still have the overhead of creating the input array.
> Maybe this is a Matlab issue, and it'll speed up once in C (anyone know?).
Matlab has a sparse array format -- how quick does that work? It'll
have to convert for IFFT, but it wouldn't have to for your sped-up version.
How are you creating the array? In Scilab it'd be
my_array = zeros(bazzilion, 1);
with bazzilion equal to your desired (large) array size.
--
Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html
Reply by Tim Wescott●January 28, 20112011-01-28
On 01/27/2011 09:43 PM, all4dsp wrote:
> Hi,
>
> I need to IDFT a large column array where only something like 0.00001% of
> the FFT coefficients are non-zero. I've tried using the definition IDFT but
> find that Matlab (or, FFTW) IFFT function is still faster.
>
> As both of these methods are still too slow, I wonder if anyone could
> recommend a pruned IDFT algorithm to speed things up while maintaining
> accuracy. My largest array size would be 536M rows x 1 column, and up to
> 4000 non-zero FFT coefficients. The data is all real. I'll implement in
> Matlab as well as C.
>
> Thanks in advance for any comments.
Are you trying to do your IDFT in Matlab code vs. Matlab's built-in
IFFT? If so, you should see significant speed up using C.
I'm not sure if you could significantly speed this up with some sort of
bastardized IFFT routine that kept track of the butterflies with
non-zero entries -- at some point the effort of book-keeping would
overwhelm the savings of not having to go through everything.
--
Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html
Reply by all4dsp●January 28, 20112011-01-28
>On Jan 28, 1:11=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
>> >On Jan 28, 12:43=3DA0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net>
>> >wrote:
>> >> Hi,
>>
>> >> I need to IDFT a large column array where only something like
0.00001%
>> of
>> >> the FFT coefficients are non-zero. I've tried using the definition
IDF=
>T
>> b=3D
>> >ut
>> >> find that Matlab (or, FFTW) IFFT function is still faster.
>>
>> >> As both of these methods are still too slow, I wonder if anyone
could
>> >> recommend a pruned IDFT algorithm to speed things up while
maintaining
>> >> accuracy. My largest array size would be 536M rows x 1 column, and
up
>> to
>> >> 4000 non-zero FFT coefficients. The data is all real. I'll implement
i=
>n
>> >> Matlab as well as C.
>>
>> >is there any rhyme or reason to which 4K bins outa 536M are being
>> >used? =A0is there any grouping or pattern or are they at "random"
>> >frequencies?
>>
>> >r b-j
>>
>> >> Thanks in advance for any comments.
>>
>> There are two applications:
>>
>> The first application has chronological groupings of 4-20 bins, each
grou=
>p
>> appearing randomly throughout the spectrum.
>
>do you mean "contiguous" groupings of 4-20 bins?
>
>> For example if there are 100M
>> FFT coefficients, the first three groups have bin indices 104-108,
>> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so groups)
ar=
>e
>> non-zero.
>
>can two (or more) groupings overlap? if they do, are they identified
>simply as a single grouping?
>
>> The second application would be one spectrum containing only one
group's
>> indices. For example, the first spectrum would contain only bins
104-108
>> non-zero. A separate analysis/spectrum would contain only bins
42812-4282=
>2
>> bins non-zero. etc. For each group there would be a separate analysis
>> containing only that groups non-zero indices.
>
>i dunno, perhaps having separate, optimized FFTs for each possible
>group size (like a little FFT for 4 bins, another for 8 bins, another
>for 16 bins, and others for 6 or 12 or 20 bins). you can FFT the
>group (as if it started at bin 0), and then adjust the group's phase
>(with one big phase offset for each group), depending on where the
>group is offset into the whole input spectrum.
>
>r b-j
>
>do you mean "contiguous" groupings of 4-20 bins?
YES
>can two (or more) groupings overlap? if they do, are they identified
>simply as a single grouping?
NO
(note, I have the FFT coefficients and I'm trying to get the IDFT
computed)
Interestingly, the dominant time consumption in computing the IFFT using
FFTW is creating the large complex array with mostly zeros. It takes longer
to do this than compute the IFFT. Even if I could optimize the IFFT time to
be very short, I'd still have the overhead of creating the input array.
Maybe this is a Matlab issue, and it'll speed up once in C (anyone know?).
Reply by robert bristow-johnson●January 28, 20112011-01-28
On Jan 28, 1:11�am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
> >On Jan 28, 12:43=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net>
> >wrote:
> >> Hi,
>
> >> I need to IDFT a large column array where only something like 0.00001%
> of
> >> the FFT coefficients are non-zero. I've tried using the definition IDFT
> b=
> >ut
> >> find that Matlab (or, FFTW) IFFT function is still faster.
>
> >> As both of these methods are still too slow, I wonder if anyone could
> >> recommend a pruned IDFT algorithm to speed things up while maintaining
> >> accuracy. My largest array size would be 536M rows x 1 column, and up
> to
> >> 4000 non-zero FFT coefficients. The data is all real. I'll implement in
> >> Matlab as well as C.
>
> >is there any rhyme or reason to which 4K bins outa 536M are being
> >used? �is there any grouping or pattern or are they at "random"
> >frequencies?
>
> >r b-j
>
> >> Thanks in advance for any comments.
>
> There are two applications:
>
> The first application has chronological groupings of 4-20 bins, each group
> appearing randomly throughout the spectrum.
do you mean "contiguous" groupings of 4-20 bins?
> For example if there are 100M
> FFT coefficients, the first three groups have bin indices 104-108,
> 42812-42822, 89352-89359 (etc., there may be 200 hundred or so groups) are
> non-zero.
can two (or more) groupings overlap? if they do, are they identified
simply as a single grouping?
> The second application would be one spectrum containing only one group's
> indices. For example, the first spectrum would contain only bins 104-108
> non-zero. A separate analysis/spectrum would contain only bins 42812-42822
> bins non-zero. etc. For each group there would be a separate analysis
> containing only that groups non-zero indices.
i dunno, perhaps having separate, optimized FFTs for each possible
group size (like a little FFT for 4 bins, another for 8 bins, another
for 16 bins, and others for 6 or 12 or 20 bins). you can FFT the
group (as if it started at bin 0), and then adjust the group's phase
(with one big phase offset for each group), depending on where the
group is offset into the whole input spectrum.
r b-j
Reply by all4dsp●January 28, 20112011-01-28
>On Jan 28, 12:43=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net>
>wrote:
>> Hi,
>>
>> I need to IDFT a large column array where only something like 0.00001%
of
>> the FFT coefficients are non-zero. I've tried using the definition IDFT
b=
>ut
>> find that Matlab (or, FFTW) IFFT function is still faster.
>>
>> As both of these methods are still too slow, I wonder if anyone could
>> recommend a pruned IDFT algorithm to speed things up while maintaining
>> accuracy. My largest array size would be 536M rows x 1 column, and up
to
>> 4000 non-zero FFT coefficients. The data is all real. I'll implement in
>> Matlab as well as C.
>
>is there any rhyme or reason to which 4K bins outa 536M are being
>used? is there any grouping or pattern or are they at "random"
>frequencies?
>
>r b-j
>
>
>> Thanks in advance for any comments.
>
>
There are two applications:
The first application has chronological groupings of 4-20 bins, each group
appearing randomly throughout the spectrum. For example if there are 100M
FFT coefficients, the first three groups have bin indices 104-108,
42812-42822, 89352-89359 (etc., there may be 200 hundred or so groups) are
non-zero.
The second application would be one spectrum containing only one group's
indices. For example, the first spectrum would contain only bins 104-108
non-zero. A separate analysis/spectrum would contain only bins 42812-42822
bins non-zero. etc. For each group there would be a separate analysis
containing only that groups non-zero indices.