hi all:
i need to do lager size fft(more than 64K);cause the dsplib do not surport it,i
made the code myself(time is not critical).And test it with sine wave(which is
generated by software).when data size is 64K,the result is bad,but when data
size is 1K,the result is very good.So I think my code is OK,and the problem is
caused by float precision, but when I use double to re-code my programme,the
problem is still exists.I am very confused.Can anyone give me some comments
about large size of fft, and we can have some discussion.
my mail is : d...@126.com, thanks
q:large size fft ?
Started by ●September 23, 2007
Reply by ●September 24, 20072007-09-24
Diao Yi Ping-
> thanks jeff, it seems not the memory troubles.because i do not use the lib,so
> theoretically,the max size is 2^31. here it attachs some picture of result.
Just because you don't use a library function, that means it's not a mem problem?
What about your code, what if it has the mem problem?
> i use software to generate the sinewave, and use min max method to compress the
> result data.i am hoping for your advice.
In the pics, your peak has shifted between 1k and 16k. What happens at 2k? At 4k?
You should find the first size that your peak shifts.
Also, you said you generate the sinewave in software? So it's a perfect sinewave,
starts at zero, with number of periods evenly divisible into FFT size? If not, then
you should create such an ideal sinewave. That way, you can expect a "single bin"
result -- if there is a memory problem causing the peak shift -- maybe overlapped
areas or some junk data -- then it's easier to debug.
-Jeff
> 2007-09-21"Jeff Brower" д
>
> Diao Yi Ping
>
> > i need to do lager size fft(more than 64K);cause the dsplib do not
> > surport it,i made the code myself(time is not critical).And test it
> > with sine wave(which is generated by software).when data size is
> > 64K,the result is bad,but when data size is 1K,the result is very
> > good.So I think my code is OK,and the problem is caused by float
> > precision, but when I use double to re-code my programme,the
> > problem is still exists.I am very confused.Can anyone give me some
> > comments about large size of fft, and we can have some discussion.
>
> Did you try 2048 pt? 4096 pt? What is the largest size that your code works?
>
> My guess is precision is not the issue and your code is having some type of memory
> issue -- exceeding array boundary, exceeding heap limit, accessing reserved address
> space, stack overflow, etc.
>
> -Jeff
>
> thanks jeff, it seems not the memory troubles.because i do not use the lib,so
> theoretically,the max size is 2^31. here it attachs some picture of result.
Just because you don't use a library function, that means it's not a mem problem?
What about your code, what if it has the mem problem?
> i use software to generate the sinewave, and use min max method to compress the
> result data.i am hoping for your advice.
In the pics, your peak has shifted between 1k and 16k. What happens at 2k? At 4k?
You should find the first size that your peak shifts.
Also, you said you generate the sinewave in software? So it's a perfect sinewave,
starts at zero, with number of periods evenly divisible into FFT size? If not, then
you should create such an ideal sinewave. That way, you can expect a "single bin"
result -- if there is a memory problem causing the peak shift -- maybe overlapped
areas or some junk data -- then it's easier to debug.
-Jeff
> 2007-09-21"Jeff Brower" д
>
> Diao Yi Ping
>
> > i need to do lager size fft(more than 64K);cause the dsplib do not
> > surport it,i made the code myself(time is not critical).And test it
> > with sine wave(which is generated by software).when data size is
> > 64K,the result is bad,but when data size is 1K,the result is very
> > good.So I think my code is OK,and the problem is caused by float
> > precision, but when I use double to re-code my programme,the
> > problem is still exists.I am very confused.Can anyone give me some
> > comments about large size of fft, and we can have some discussion.
>
> Did you try 2048 pt? 4096 pt? What is the largest size that your code works?
>
> My guess is precision is not the issue and your code is having some type of memory
> issue -- exceeding array boundary, exceeding heap limit, accessing reserved address
> space, stack overflow, etc.
>
> -Jeff
>
Reply by ●September 24, 20072007-09-24
In addition to Jeff's notes, have you checked the twiddle factors table,
does its size match, is it bit (or digit) -reversed (if any) correctly?
It could be that either the FFT algorithm itself or the twiddle factor table
generating algorithm are not generic and restricted to a certain maximum
length. By generic I mean one that process any appropriate size array -
a power of 2 or 4 or a combination, etc. E.g. the bit/digit reversing
code supplied with TI's DSP library is probably size restricted, since
it uses a pre-built table of reversed indices (it needs to be checked,
I do not recall this with unit probability).
Floating point accuracy is not quite relevant here.
Regards,
Andrew
> Subject: q:large size fft ?
> Posted by: "d...@126.com" d...@126.com dyp_and_jmy
> Date: Fri Sep 21, 2007 5:37 am ((PDT))
>
> hi all:
> i need to do lager size fft (more than 64K); cause the dsplib do not surport
> it, i made the code myself (time is not critical). And test it with sine
> wave (which is generated by software). when data size is 64K, the result is
> bad, but when data size is 1K, the result is very good. So I think my code is
> OK, and the problem is caused by float precision, but when I use double to
> re-code my programme, the problem is still exists. I am very confused. Can
> anyone give me some comments about large size of fft, and we can have some
> discussion.
> my mail is : d...@126.com, thanks
does its size match, is it bit (or digit) -reversed (if any) correctly?
It could be that either the FFT algorithm itself or the twiddle factor table
generating algorithm are not generic and restricted to a certain maximum
length. By generic I mean one that process any appropriate size array -
a power of 2 or 4 or a combination, etc. E.g. the bit/digit reversing
code supplied with TI's DSP library is probably size restricted, since
it uses a pre-built table of reversed indices (it needs to be checked,
I do not recall this with unit probability).
Floating point accuracy is not quite relevant here.
Regards,
Andrew
> Subject: q:large size fft ?
> Posted by: "d...@126.com" d...@126.com dyp_and_jmy
> Date: Fri Sep 21, 2007 5:37 am ((PDT))
>
> hi all:
> i need to do lager size fft (more than 64K); cause the dsplib do not surport
> it, i made the code myself (time is not critical). And test it with sine
> wave (which is generated by software). when data size is 64K, the result is
> bad, but when data size is 1K, the result is very good. So I think my code is
> OK, and the problem is caused by float precision, but when I use double to
> re-code my programme, the problem is still exists. I am very confused. Can
> anyone give me some comments about large size of fft, and we can have some
> discussion.
> my mail is : d...@126.com, thanks
Reply by ●September 25, 20072007-09-25
thanks for the apply,now I attach the code here:
typedef struct tagComplexData
{
float real;
float imag;
}COMPLEX,*LPCOMPLEX;
void ComplexFFT(COMPLEX*xin,int N)
{
register int f,m,nv2,nm1,i,k,j=1,l ;
/*int f,m,nv2,nm1,i,k,j=N/2,l;*/
register COMPLEX v,w,t,x1,x2 ;
nv2=N/2 ;
f=N ;
for(m=1,f>>=1;f!=1;m++)
{
f>>=1;
}
nm1=N-1 ;
//bit reverse/
for(i=1;i<=nm1;i++)
{
if(i {
t=xin[j-1];
xin[j-1]=xin[i-1];
xin[i-1]=t ;
}
k=nv2 ;
while(k {
j=j-k ;
k=k/2 ;
}
j=j+k ;
}
register int le,lei,ip ;
register float pi=3.1415926,temp ;
for(l=1;l<=m;l++)
{
//le=pow(2,l);
le=1< // is L not 1 !!!!
lei=le/2 ;
v.real=1.0 ;
v.imag=0.0 ;
w.real=cossp(pi/lei);
w.imag=-sinsp(pi/lei);
for(j=1;j<=lei;j++)
{
for(i=j;i<=N;i=i+le)
{
ip=i+lei ;
x1=xin[ip-1];
x2=xin[i-1];
t.real=x1.real*v.real-x1.imag*v.imag;
t.imag=x1.real*v.imag+x1.imag*v.real;
xin[ip-1].real=x2.real-t.real ;
xin[ip-1].imag=x2.imag-t.imag ;
xin[i-1].real=x2.real+t.real ;
xin[i-1].imag=x2.imag+t.imag ;
}
//v=ComplexDataMul(v,w);
temp=v.real*w.real-v.imag*w.imag;
v.imag=v.real*w.imag+v.imag*w.real;
v.real=temp;
}
}
return ;
}
I think the code is right(i check it with 8-point fft,and the result is same as the result i caculate myself in hand),but when it is used to comput 1K,the result is worse than the dsplib function, when the size is large, the result is unbearable.It troubles me a lot.
2007-09-24"Jeff Brower" д
Diao Yi Ping-
thanks jeff, it seems not the memory troubles.because i do not use the lib,so theoretically,the max size is 2^31. here it attachs some picture of result.
Just because you don't use a library function, that means it's not a mem problem? What about your code, what if it has the mem problem?
i use software to generate the sinewave, and use min max method to compress the result data.i am hoping for your advice.
In the pics, your peak has shifted between 1k and 16k. What happens at 2k? At 4k? You should find the first size that your peak shifts.
Also, you said you generate the sinewave in software? So it's a perfect sinewave, starts at zero, with number of periods evenly divisible into FFT size? If not, then you should create such an ideal sinewave. That way, you can expect a "single bin" result -- if there is a memory problem causing the peak shift -- maybe overlapped areas or some junk data -- then it's easier to debug.
-Jeff
2007-09-21"Jeff Brower" д
Diao Yi Ping
> i need to do lager size fft(more than 64K);cause the dsplib do not
> surport it,i made the code myself(time is not critical).And test it
> with sine wave(which is generated by software).when data size is
> 64K,the result is bad,but when data size is 1K,the result is very
> good.So I think my code is OK,and the problem is caused by float
> precision, but when I use double to re-code my programme,the
> problem is still exists.I am very confused.Can anyone give me some
> comments about large size of fft, and we can have some discussion.
Did you try 2048 pt? 4096 pt? What is the largest size that your code works?
My guess is precision is not the issue and your code is having some type of memory
issue -- exceeding array boundary, exceeding heap limit, accessing reserved address
space, stack overflow, etc.
-Jeff
typedef struct tagComplexData
{
float real;
float imag;
}COMPLEX,*LPCOMPLEX;
void ComplexFFT(COMPLEX*xin,int N)
{
register int f,m,nv2,nm1,i,k,j=1,l ;
/*int f,m,nv2,nm1,i,k,j=N/2,l;*/
register COMPLEX v,w,t,x1,x2 ;
nv2=N/2 ;
f=N ;
for(m=1,f>>=1;f!=1;m++)
{
f>>=1;
}
nm1=N-1 ;
//bit reverse/
for(i=1;i<=nm1;i++)
{
if(i {
t=xin[j-1];
xin[j-1]=xin[i-1];
xin[i-1]=t ;
}
k=nv2 ;
while(k {
j=j-k ;
k=k/2 ;
}
j=j+k ;
}
register int le,lei,ip ;
register float pi=3.1415926,temp ;
for(l=1;l<=m;l++)
{
//le=pow(2,l);
le=1< // is L not 1 !!!!
lei=le/2 ;
v.real=1.0 ;
v.imag=0.0 ;
w.real=cossp(pi/lei);
w.imag=-sinsp(pi/lei);
for(j=1;j<=lei;j++)
{
for(i=j;i<=N;i=i+le)
{
ip=i+lei ;
x1=xin[ip-1];
x2=xin[i-1];
t.real=x1.real*v.real-x1.imag*v.imag;
t.imag=x1.real*v.imag+x1.imag*v.real;
xin[ip-1].real=x2.real-t.real ;
xin[ip-1].imag=x2.imag-t.imag ;
xin[i-1].real=x2.real+t.real ;
xin[i-1].imag=x2.imag+t.imag ;
}
//v=ComplexDataMul(v,w);
temp=v.real*w.real-v.imag*w.imag;
v.imag=v.real*w.imag+v.imag*w.real;
v.real=temp;
}
}
return ;
}
I think the code is right(i check it with 8-point fft,and the result is same as the result i caculate myself in hand),but when it is used to comput 1K,the result is worse than the dsplib function, when the size is large, the result is unbearable.It troubles me a lot.
2007-09-24"Jeff Brower" д
Diao Yi Ping-
thanks jeff, it seems not the memory troubles.because i do not use the lib,so theoretically,the max size is 2^31. here it attachs some picture of result.
Just because you don't use a library function, that means it's not a mem problem? What about your code, what if it has the mem problem?
i use software to generate the sinewave, and use min max method to compress the result data.i am hoping for your advice.
In the pics, your peak has shifted between 1k and 16k. What happens at 2k? At 4k? You should find the first size that your peak shifts.
Also, you said you generate the sinewave in software? So it's a perfect sinewave, starts at zero, with number of periods evenly divisible into FFT size? If not, then you should create such an ideal sinewave. That way, you can expect a "single bin" result -- if there is a memory problem causing the peak shift -- maybe overlapped areas or some junk data -- then it's easier to debug.
-Jeff
2007-09-21"Jeff Brower" д
Diao Yi Ping
> i need to do lager size fft(more than 64K);cause the dsplib do not
> surport it,i made the code myself(time is not critical).And test it
> with sine wave(which is generated by software).when data size is
> 64K,the result is bad,but when data size is 1K,the result is very
> good.So I think my code is OK,and the problem is caused by float
> precision, but when I use double to re-code my programme,the
> problem is still exists.I am very confused.Can anyone give me some
> comments about large size of fft, and we can have some discussion.
Did you try 2048 pt? 4096 pt? What is the largest size that your code works?
My guess is precision is not the issue and your code is having some type of memory
issue -- exceeding array boundary, exceeding heap limit, accessing reserved address
space, stack overflow, etc.
-Jeff
Reply by ●September 25, 20072007-09-25
hello,
As you say time is not critical, have you tried to do it as a DFT ?
it is easier, more straightforward and will probably avoid you some truncations.
I would suggest you do a test with random data, 1 sinewave won't be enough for proof.
What is your reference? Matlab FFT?
From: d...@126.com
Reply-To: d...@126.com
To: c...
Subject: [c6x] q:large size fft ?
Date: Fri, 21 Sep 2007 04:06:22 -0400
>hi all:
>i need to do lager size fft(more than 64K);cause the dsplib do not surport it,i made the code myself(time is not critical).And test it with sine wave(which is generated by software).when data size is 64K,the result is bad,but when data size is 1K,the result is very good.So I think my code is OK,and the problem is caused by float precision, but when I use double to re-code my programme,the problem is still exists.I am very confused.Can anyone give me some comments about large size of fft, and we can have some discussion.
>my mail is : d...@126.com, thanks
__._,_.___
">http://www.dsprelated.com/groups/c6x/1.php
_____________________________________
Note: If you do a simple "reply" with your email client, only the author of this message will receive your answer. You need to do a "reply all" if you want your answer to be distributed to the entire group.
_____________________________________
About this discussion group:
Archives: http://www.dsprelated.com/groups/c6x/1.php
To Post: Send an email to c...
Other DSP Related Groups: http://www.dsprelated.com/groups.php
![]()
__,_._,___
Reply by ●October 15, 20072007-10-15
hi all:
thanks for your previous advices,i check my code carefully, and find out some mistakes. now i test my code again with 1K ,16K,128K(float)theoretical sine wave ,attatched is the result.
Again,when input size is 1K, the result is correct(no DC, and the frequency value is below 130dB),but when the size is larger, the result is worse, is the phenomenon right or still there is something wrong ?
2007-09-26"Jeff Brower" д
Diao Yi Ping-
I think the code is right(i check it with 8-point fft,and the result is same as the result i caculate myself in hand),but when it is used to comput 1K,the result is worse than the dsplib function, when the size is large, the result is unbearable.It troubles me a lot.
The problem you are facing is lack of debug skills and discipline. You need to go step-by-step, very carefully, without jumping to conclusions like "memory problem". Various people on the group have asked you questions like:
-if 8-pt works, then what happens with 16-pt? What is
the exact difference?
-did you separate bit-reverse and test it separately?
-did you compare DSP results with Linux C code or with
MATLAB?
-what happens with all-one input? With a single unit
function (1 followed by zeros)? With the "size-matched
sinewave" function I mentioned in previous e-mail?
These are the questions you should answer if you expect more help from the group.
-Jeff
thanks for the apply,now I attach the code here: typedef struct tagComplexData
{
float real;
float imag;}COMPLEX,*LPCOMPLEX; void ComplexFFT(COMPLEX*xin,int N)
{
register int f,m,nv2,nm1,i,k,j=1,l ;
/*int f,m,nv2,nm1,i,k,j=N/2,l;*/
register COMPLEX v,w,t,x1,x2 ;
nv2=N/2 ;
f=N ; for(m=1,f>>=1;f!=1;m++)
{
f>>=1;
}
nm1=N-1 ;
//bit reverse/
for(i=1;i<=nm1;i++)
{
if(i {
t=xin[j-1];
xin[j-1]=xin[i-1];
xin[i-1]=t ;
}
k=nv2 ;
while(k {
j=j-k ;
k=k/2 ;
}
j=j+k ;
}
register int le,lei,ip ;
register float pi=3.1415926,temp ;
for(l=1;l<=m;l++)
{
//le=pow(2,l);
le=1< // is L not 1 !!!!
lei=le/2 ;
v.real=1.0 ;
v.imag=0.0 ;
w.real=cossp(pi/lei);
w.imag=-sinsp(pi/lei);
for(j=1;j<=lei;j++)
{
for(i=j;i<=N;i=i+le)
{
ip=i+lei ;
x1=xin[ip-1];
x2=xin[i-1]; t.real=x1.real*v.real-x1.imag*v.imag;
t.imag=x1.real*v.imag+x1.imag*v.real; xin[ip-1].real=x2.real-t.real ;
xin[ip-1].imag=x2.imag-t.imag ;
xin[i-1].real=x2.real+t.real ;
xin[i-1].imag=x2.imag+t.imag ; }
//v=ComplexDataMul(v,w);
temp=v.real*w.real-v.imag*w.imag;
v.imag=v.real*w.imag+v.imag*w.real;
v.real=temp; }
}
return ;
} I think the code is right(i check it with 8-point fft,and the result is same as the result i caculate myself in hand),but when it is used to comput 1K,the result is worse than the dsplib function, when the size is large, the result is unbearable.It troubles me a lot. 2007-09-24"Jeff Brower" д
Diao Yi Ping-
thanks jeff, it seems not the memory troubles.because i do not use the lib,so theoretically,the max size is 2^31. here it attachs some picture of result.
Just because you don't use a library function, that means it's not a mem problem? What about your code, what if it has the mem problem?
i use software to generate the sinewave, and use min max method to compress the result data.i am hoping for your advice.
In the pics, your peak has shifted between 1k and 16k. What happens at 2k? At 4k? You should find the first size that your peak shifts.
Also, you said you generate the sinewave in software? So it's a perfect sinewave, starts at zero, with number of periods evenly divisible into FFT size? If not, then you should create such an ideal sinewave. That way, you can expect a "single bin" result -- if there is a memory problem causing the peak shift -- maybe overlapped areas or some junk data -- then it's easier to debug.
-Jeff
2007-09-21"Jeff Brower" д
Diao Yi Ping
> i need to do lager size fft(more than 64K);cause the dsplib do not
> surport it,i made the code myself(time is not critical).And test it
> with sine wave(which is generated by software).when data size is
> 64K,the result is bad,but when data size is 1K,the result is very
> good.So I think my code is OK,and the problem is caused by float
> precision, but when I use double to re-code my programme,the
> problem is still exists.I am very confused.Can anyone give me some
> comments about large size of fft, and we can have some discussion.
Did you try 2048 pt? 4096 pt? What is the largest size that your code works?
My guess is precision is not the issue and your code is having some type of memory
issue -- exceeding array boundary, exceeding heap limit, accessing reserved address
space, stack overflow, etc.
-Jeff
ȫ ְ Ů Ů 2 0 0 7
thanks for your previous advices,i check my code carefully, and find out some mistakes. now i test my code again with 1K ,16K,128K(float)theoretical sine wave ,attatched is the result.
Again,when input size is 1K, the result is correct(no DC, and the frequency value is below 130dB),but when the size is larger, the result is worse, is the phenomenon right or still there is something wrong ?
2007-09-26"Jeff Brower" д
Diao Yi Ping-
I think the code is right(i check it with 8-point fft,and the result is same as the result i caculate myself in hand),but when it is used to comput 1K,the result is worse than the dsplib function, when the size is large, the result is unbearable.It troubles me a lot.
The problem you are facing is lack of debug skills and discipline. You need to go step-by-step, very carefully, without jumping to conclusions like "memory problem". Various people on the group have asked you questions like:
-if 8-pt works, then what happens with 16-pt? What is
the exact difference?
-did you separate bit-reverse and test it separately?
-did you compare DSP results with Linux C code or with
MATLAB?
-what happens with all-one input? With a single unit
function (1 followed by zeros)? With the "size-matched
sinewave" function I mentioned in previous e-mail?
These are the questions you should answer if you expect more help from the group.
-Jeff
thanks for the apply,now I attach the code here: typedef struct tagComplexData
{
float real;
float imag;}COMPLEX,*LPCOMPLEX; void ComplexFFT(COMPLEX*xin,int N)
{
register int f,m,nv2,nm1,i,k,j=1,l ;
/*int f,m,nv2,nm1,i,k,j=N/2,l;*/
register COMPLEX v,w,t,x1,x2 ;
nv2=N/2 ;
f=N ; for(m=1,f>>=1;f!=1;m++)
{
f>>=1;
}
nm1=N-1 ;
//bit reverse/
for(i=1;i<=nm1;i++)
{
if(i {
t=xin[j-1];
xin[j-1]=xin[i-1];
xin[i-1]=t ;
}
k=nv2 ;
while(k {
j=j-k ;
k=k/2 ;
}
j=j+k ;
}
register int le,lei,ip ;
register float pi=3.1415926,temp ;
for(l=1;l<=m;l++)
{
//le=pow(2,l);
le=1< // is L not 1 !!!!
lei=le/2 ;
v.real=1.0 ;
v.imag=0.0 ;
w.real=cossp(pi/lei);
w.imag=-sinsp(pi/lei);
for(j=1;j<=lei;j++)
{
for(i=j;i<=N;i=i+le)
{
ip=i+lei ;
x1=xin[ip-1];
x2=xin[i-1]; t.real=x1.real*v.real-x1.imag*v.imag;
t.imag=x1.real*v.imag+x1.imag*v.real; xin[ip-1].real=x2.real-t.real ;
xin[ip-1].imag=x2.imag-t.imag ;
xin[i-1].real=x2.real+t.real ;
xin[i-1].imag=x2.imag+t.imag ; }
//v=ComplexDataMul(v,w);
temp=v.real*w.real-v.imag*w.imag;
v.imag=v.real*w.imag+v.imag*w.real;
v.real=temp; }
}
return ;
} I think the code is right(i check it with 8-point fft,and the result is same as the result i caculate myself in hand),but when it is used to comput 1K,the result is worse than the dsplib function, when the size is large, the result is unbearable.It troubles me a lot. 2007-09-24"Jeff Brower" д
Diao Yi Ping-
thanks jeff, it seems not the memory troubles.because i do not use the lib,so theoretically,the max size is 2^31. here it attachs some picture of result.
Just because you don't use a library function, that means it's not a mem problem? What about your code, what if it has the mem problem?
i use software to generate the sinewave, and use min max method to compress the result data.i am hoping for your advice.
In the pics, your peak has shifted between 1k and 16k. What happens at 2k? At 4k? You should find the first size that your peak shifts.
Also, you said you generate the sinewave in software? So it's a perfect sinewave, starts at zero, with number of periods evenly divisible into FFT size? If not, then you should create such an ideal sinewave. That way, you can expect a "single bin" result -- if there is a memory problem causing the peak shift -- maybe overlapped areas or some junk data -- then it's easier to debug.
-Jeff
2007-09-21"Jeff Brower" д
Diao Yi Ping
> i need to do lager size fft(more than 64K);cause the dsplib do not
> surport it,i made the code myself(time is not critical).And test it
> with sine wave(which is generated by software).when data size is
> 64K,the result is bad,but when data size is 1K,the result is very
> good.So I think my code is OK,and the problem is caused by float
> precision, but when I use double to re-code my programme,the
> problem is still exists.I am very confused.Can anyone give me some
> comments about large size of fft, and we can have some discussion.
Did you try 2048 pt? 4096 pt? What is the largest size that your code works?
My guess is precision is not the issue and your code is having some type of memory
issue -- exceeding array boundary, exceeding heap limit, accessing reserved address
space, stack overflow, etc.
-Jeff
ȫ ְ Ů Ů 2 0 0 7
Reply by ●October 16, 20072007-10-16
Hi Diao Yi Ping,
> Subject: Re: q:large size fft ?
> Posted by: "d...@126.com" d...@126.com dyp_and_jmy
> Date: Mon Oct 15, 2007 4:36 am ((PDT))
>
> hi all:
> thanks for your previous advices,i check my code carefully, and find
> out some mistakes. now i test my code again with 1K, 16K, 128K (float)
> theoretical sine wave, attatched is the result.
>
> Again,when input size is 1K, the result is correct(no DC, and the frequency
> value is below 130dB),but when the size is larger, the result is worse, is
> the phenomenon right or still there is something wrong ?
Yes the phenomenon shows up predictably and yes there is something that causes
the problem:
Usually, floating point recurrences (look for the lines marked "// REC" in
your code below) are used when iterations quickly converge. Apparently, in the
case of trig recurrences, iterations does not converge, hence calculation errors
accumulate and in the case of large fft size may seriously affect accuracy of
calculated twiddle factors. I.e. the twiddle factors calculated recurrently may
significantly deviate from their exact (in the floating point sense) values.
E.g. for N8K there are 64K iterations of the inner (second) loop at maximum,
and the same number of recurrences to calculate twiddle factors.
The workaround is quite simple: use a short table of exp(-2*pi*k*l/N) twiddles
and more elaborate indexing method into this table. There are quite a bit of
FFT codes on the net that use this method. To begin with, please take a look at
http://www.jjj.de/fxt/
The term W(k,l) = exp(-2*pi*i*k*l/N) is a periodic function of N, W(k) = W(k+N*m),
therefore the table needs to keep a number on the order of N values of W().
Rgds,
Andrew
> [...] I attach the code here:
>
> typedef struct tagComplexData
> {
> float real;
> float imag;
> } COMPLEX, *LPCOMPLEX;
>
> void ComplexFFT (COMPLEX * xin, int N)
> {
> register int f, m, nv2, nm1, i, k, j = 1, l;
> /* int f, m, nv2, nm1, i, k, j = N/2, l; */
> register COMPLEX v, w, t, x1, x2;
> nv2=N/2 ;
> f=N ;
>
> for (m = 1, f >>= 1; f != 1; m++)
> {
> f>>=1;
> }
>
> nm1=N-1 ;
>
> //bit reverse/
> for (i = 1; i <= nm1; i++)
> {
> if(i < j)
> {
> t = xin[j-1];
> xin[j-1] = xin[i-1];
> xin[i-1] = t;
> }
> k = nv2;
> while (k < j)
> {
> j = j - k;
> k = k / 2;
> }
> j = j + k;
> }
>
> register int le, lei, ip;
> register float pi = 3.1415926, temp;
>
> for (l = 1; l <= m; l++)
> {
> // le = pow (2, l);
> le = 1 << l;
> // is L not 1 !!!!
> lei = le / 2;
> v.real = 1.0;
> v.imag = 0.0;
> w.real = cossp (pi / lei); // REC 0th iteration
> w.imag =-sinsp (pi / lei); // REC 0th iteration
>
> for (j = 1; j <= lei; j++)
> {
> for (i = j; i <= N; i = i + le)
> {
> ip = i + lei ;
> x1 = xin[ip-1];
> x2 = xin[i-1];
> t.real = x1.real * v.real - x1.imag * v.imag;
> t.imag = x1.real * v.imag + x1.imag * v.real;
> xin[ip-1].real = x2.real - t.real;
> xin[ip-1].imag = x2.imag - t.imag;
> xin[i-1].real = x2.real + t.real;
> xin[i-1].imag = x2.imag + t.imag;
> }
>
> // v = ComplexDataMul (v, w);
> temp = v.real * w.real - v.imag * w.imag; // REC
> v.imag = v.real * w.imag + v.imag * w.real; // REC
> v.real = temp; // REC
> }
> }
>
> return;
> }
>
> Subject: Re: q:large size fft ?
> Posted by: "d...@126.com" d...@126.com dyp_and_jmy
> Date: Mon Oct 15, 2007 4:36 am ((PDT))
>
> hi all:
> thanks for your previous advices,i check my code carefully, and find
> out some mistakes. now i test my code again with 1K, 16K, 128K (float)
> theoretical sine wave, attatched is the result.
>
> Again,when input size is 1K, the result is correct(no DC, and the frequency
> value is below 130dB),but when the size is larger, the result is worse, is
> the phenomenon right or still there is something wrong ?
Yes the phenomenon shows up predictably and yes there is something that causes
the problem:
Usually, floating point recurrences (look for the lines marked "// REC" in
your code below) are used when iterations quickly converge. Apparently, in the
case of trig recurrences, iterations does not converge, hence calculation errors
accumulate and in the case of large fft size may seriously affect accuracy of
calculated twiddle factors. I.e. the twiddle factors calculated recurrently may
significantly deviate from their exact (in the floating point sense) values.
E.g. for N8K there are 64K iterations of the inner (second) loop at maximum,
and the same number of recurrences to calculate twiddle factors.
The workaround is quite simple: use a short table of exp(-2*pi*k*l/N) twiddles
and more elaborate indexing method into this table. There are quite a bit of
FFT codes on the net that use this method. To begin with, please take a look at
http://www.jjj.de/fxt/
The term W(k,l) = exp(-2*pi*i*k*l/N) is a periodic function of N, W(k) = W(k+N*m),
therefore the table needs to keep a number on the order of N values of W().
Rgds,
Andrew
> [...] I attach the code here:
>
> typedef struct tagComplexData
> {
> float real;
> float imag;
> } COMPLEX, *LPCOMPLEX;
>
> void ComplexFFT (COMPLEX * xin, int N)
> {
> register int f, m, nv2, nm1, i, k, j = 1, l;
> /* int f, m, nv2, nm1, i, k, j = N/2, l; */
> register COMPLEX v, w, t, x1, x2;
> nv2=N/2 ;
> f=N ;
>
> for (m = 1, f >>= 1; f != 1; m++)
> {
> f>>=1;
> }
>
> nm1=N-1 ;
>
> //bit reverse/
> for (i = 1; i <= nm1; i++)
> {
> if(i < j)
> {
> t = xin[j-1];
> xin[j-1] = xin[i-1];
> xin[i-1] = t;
> }
> k = nv2;
> while (k < j)
> {
> j = j - k;
> k = k / 2;
> }
> j = j + k;
> }
>
> register int le, lei, ip;
> register float pi = 3.1415926, temp;
>
> for (l = 1; l <= m; l++)
> {
> // le = pow (2, l);
> le = 1 << l;
> // is L not 1 !!!!
> lei = le / 2;
> v.real = 1.0;
> v.imag = 0.0;
> w.real = cossp (pi / lei); // REC 0th iteration
> w.imag =-sinsp (pi / lei); // REC 0th iteration
>
> for (j = 1; j <= lei; j++)
> {
> for (i = j; i <= N; i = i + le)
> {
> ip = i + lei ;
> x1 = xin[ip-1];
> x2 = xin[i-1];
> t.real = x1.real * v.real - x1.imag * v.imag;
> t.imag = x1.real * v.imag + x1.imag * v.real;
> xin[ip-1].real = x2.real - t.real;
> xin[ip-1].imag = x2.imag - t.imag;
> xin[i-1].real = x2.real + t.real;
> xin[i-1].imag = x2.imag + t.imag;
> }
>
> // v = ComplexDataMul (v, w);
> temp = v.real * w.real - v.imag * w.imag; // REC
> v.imag = v.real * w.imag + v.imag * w.real; // REC
> v.real = temp; // REC
> }
> }
>
> return;
> }
>
Reply by ●October 18, 20072007-10-18
hi Andrew and all:
I change my code as follow:
void ComplexFFT2(COMPLEX*xin,int n)
{
register float w_re, w_im, t_re, t_im;
register int m, g, b;
register int i, mt, k;
register float pi=3.1415926;
register int bev=0;
register int numbits=1,log2n=n;
register float x1_re,x1_im,x2_re,x2_im;
for(numbits=1,log2n>>=1;log2n!=1;numbits++)
{
log2n>>=1;
}
/* for each stage */
for (m=n; m>=2; m=m>>1)
{
/* m = n/2^s; mt = m/2; */
mt = m >> 1;
/* for each group of butterfly */
for (g=0,k=0; g {
bev=bitrev(k,numbits-1 );
w_re = cossp(2*pi*bev/n); //REC
w_im = -sinsp(2*pi*bev/n); //REC
/* for each butterfly */
for (b=g; b<(g+mt); b++)
{
x1_re=xin[b].real;
x1_im=xin[b].imag;
x2_re=xin[b+mt].real;
x2_im=xin[b+mt].imag;
t_re = w_re * x2_re - w_im * x2_im;
t_im = w_re * x2_im + w_im * x2_re;
xin[b].real = x1_re + t_re;
xin[b].imag = x1_im + t_im;
xin[b+mt].real = x1_re - t_re;
xin[b+mt].imag = x1_im - t_im;
}
}
}
BitReverse_complex(xin, n);
return ;
}
in new code,there is no recurrence caculation for the twiddle factor,and the result is much better,thanks.
but when the input size is large(128k etc.),the result is still unbearable unless I use 'double float' to recode it.
I think 'float's precision is not enough for the large size fft, the accumulation of the trunction error from butterfly effects the final result.
Do I must use 'double ' if I want to do large size fft and get a high precious result ?
Is there some trics to get result better without using 'double'?
2007-10-17"Andrew Nesterov" д
Hi Diao Yi Ping,
> Subject: Re: q:large size fft ?
> Posted by: "diaoyiping...@126.com" d...@126.com dyp_and_jmy
> Date: Mon Oct 15, 2007 4:36 am ((PDT))
>
> hi all:
> thanks for your previous advices,i check my code carefully, and find
> out some mistakes. now i test my code again with 1K, 16K, 128K (float)
> theoretical sine wave, attatched is the result.
>
> Again,when input size is 1K, the result is correct(no DC, and the frequency
> value is below 130dB),but when the size is larger, the result is worse, is
> the phenomenon right or still there is something wrong ?
Yes the phenomenon shows up predictably and yes there is something that causes
the problem:
Usually, floating point recurrences (look for the lines marked "// REC" in
your code below) are used when iterations quickly converge. Apparently, in the
case of trig recurrences, iterations does not converge, hence calculation errors
accumulate and in the case of large fft size may seriously affect accuracy of
calculated twiddle factors. I.e. the twiddle factors calculated recurrently may
significantly deviate from their exact (in the floating point sense) values.
E.g. for N=128K there are 64K iterations of the inner (second) loop at maximum,
and the same number of recurrences to calculate twiddle factors.
The workaround is quite simple: use a short table of exp(-2*pi*k*l/N) twiddles
and more elaborate indexing method into this table. There are quite a bit of
FFT codes on the net that use this method. To begin with, please take a look at
http://www.jjj.de/fxt/
The term W(k,l) = exp(-2*pi*i*k*l/N) is a periodic function of N, W(k) = W(k+N*m),
therefore the table needs to keep a number on the order of N values of W().
Rgds,
Andrew
> [...] I attach the code here:
>
> typedef struct tagComplexData
> {
> float real;
> float imag;
> } COMPLEX, *LPCOMPLEX;
>
> void ComplexFFT (COMPLEX * xin, int N)
> {
> register int f, m, nv2, nm1, i, k, j = 1, l;
> /* int f, m, nv2, nm1, i, k, j = N/2, l; */
> register COMPLEX v, w, t, x1, x2;
> nv2=N/2 ;
> f=N ;
>
> for (m = 1, f >>= 1; f != 1; m++)
> {
> f>>=1;
> }
>
> nm1=N-1 ;
>
> //bit reverse/
> for (i = 1; i <= nm1; i++)
> {
> if(i < j)
> {
> t = xin[j-1];
> xin[j-1] = xin[i-1];
> xin[i-1] = t;
> }
> k = nv2;
> while (k < j)
> {
> j = j - k;
> k = k / 2;
> }
> j = j + k;
> }
>
> register int le, lei, ip;
> register float pi = 3.1415926, temp;
>
> for (l = 1; l <= m; l++)
> {
> // le = pow (2, l);
> le = 1 << l;
> // is L not 1 !!!!
> lei = le / 2;
> v.real = 1.0;
> v.imag = 0.0;
> w.real = cossp (pi / lei); // REC 0th iteration
> w.imag =-sinsp (pi / lei); // REC 0th iteration
>
> for (j = 1; j <= lei; j++)
> {
> for (i = j; i <= N; i = i + le)
> {
> ip = i + lei ;
> x1 = xin[ip-1];
> x2 = xin[i-1];
> t.real = x1.real * v.real - x1.imag * v.imag;
> t.imag = x1.real * v.imag + x1.imag * v.real;
> xin[ip-1].real = x2.real - t.real;
> xin[ip-1].imag = x2.imag - t.imag;
> xin[i-1].real = x2.real + t.real;
> xin[i-1].imag = x2.imag + t.imag;
> }
>
> // v = ComplexDataMul (v, w);
> temp = v.real * w.real - v.imag * w.imag; // REC
> v.imag = v.real * w.imag + v.imag * w.real; // REC
> v.real = temp; // REC
> }
> }
>
> return;
> }
>
I change my code as follow:
void ComplexFFT2(COMPLEX*xin,int n)
{
register float w_re, w_im, t_re, t_im;
register int m, g, b;
register int i, mt, k;
register float pi=3.1415926;
register int bev=0;
register int numbits=1,log2n=n;
register float x1_re,x1_im,x2_re,x2_im;
for(numbits=1,log2n>>=1;log2n!=1;numbits++)
{
log2n>>=1;
}
/* for each stage */
for (m=n; m>=2; m=m>>1)
{
/* m = n/2^s; mt = m/2; */
mt = m >> 1;
/* for each group of butterfly */
for (g=0,k=0; g {
bev=bitrev(k,numbits-1 );
w_re = cossp(2*pi*bev/n); //REC
w_im = -sinsp(2*pi*bev/n); //REC
/* for each butterfly */
for (b=g; b<(g+mt); b++)
{
x1_re=xin[b].real;
x1_im=xin[b].imag;
x2_re=xin[b+mt].real;
x2_im=xin[b+mt].imag;
t_re = w_re * x2_re - w_im * x2_im;
t_im = w_re * x2_im + w_im * x2_re;
xin[b].real = x1_re + t_re;
xin[b].imag = x1_im + t_im;
xin[b+mt].real = x1_re - t_re;
xin[b+mt].imag = x1_im - t_im;
}
}
}
BitReverse_complex(xin, n);
return ;
}
in new code,there is no recurrence caculation for the twiddle factor,and the result is much better,thanks.
but when the input size is large(128k etc.),the result is still unbearable unless I use 'double float' to recode it.
I think 'float's precision is not enough for the large size fft, the accumulation of the trunction error from butterfly effects the final result.
Do I must use 'double ' if I want to do large size fft and get a high precious result ?
Is there some trics to get result better without using 'double'?
2007-10-17"Andrew Nesterov" д
Hi Diao Yi Ping,
> Subject: Re: q:large size fft ?
> Posted by: "diaoyiping...@126.com" d...@126.com dyp_and_jmy
> Date: Mon Oct 15, 2007 4:36 am ((PDT))
>
> hi all:
> thanks for your previous advices,i check my code carefully, and find
> out some mistakes. now i test my code again with 1K, 16K, 128K (float)
> theoretical sine wave, attatched is the result.
>
> Again,when input size is 1K, the result is correct(no DC, and the frequency
> value is below 130dB),but when the size is larger, the result is worse, is
> the phenomenon right or still there is something wrong ?
Yes the phenomenon shows up predictably and yes there is something that causes
the problem:
Usually, floating point recurrences (look for the lines marked "// REC" in
your code below) are used when iterations quickly converge. Apparently, in the
case of trig recurrences, iterations does not converge, hence calculation errors
accumulate and in the case of large fft size may seriously affect accuracy of
calculated twiddle factors. I.e. the twiddle factors calculated recurrently may
significantly deviate from their exact (in the floating point sense) values.
E.g. for N=128K there are 64K iterations of the inner (second) loop at maximum,
and the same number of recurrences to calculate twiddle factors.
The workaround is quite simple: use a short table of exp(-2*pi*k*l/N) twiddles
and more elaborate indexing method into this table. There are quite a bit of
FFT codes on the net that use this method. To begin with, please take a look at
http://www.jjj.de/fxt/
The term W(k,l) = exp(-2*pi*i*k*l/N) is a periodic function of N, W(k) = W(k+N*m),
therefore the table needs to keep a number on the order of N values of W().
Rgds,
Andrew
> [...] I attach the code here:
>
> typedef struct tagComplexData
> {
> float real;
> float imag;
> } COMPLEX, *LPCOMPLEX;
>
> void ComplexFFT (COMPLEX * xin, int N)
> {
> register int f, m, nv2, nm1, i, k, j = 1, l;
> /* int f, m, nv2, nm1, i, k, j = N/2, l; */
> register COMPLEX v, w, t, x1, x2;
> nv2=N/2 ;
> f=N ;
>
> for (m = 1, f >>= 1; f != 1; m++)
> {
> f>>=1;
> }
>
> nm1=N-1 ;
>
> //bit reverse/
> for (i = 1; i <= nm1; i++)
> {
> if(i < j)
> {
> t = xin[j-1];
> xin[j-1] = xin[i-1];
> xin[i-1] = t;
> }
> k = nv2;
> while (k < j)
> {
> j = j - k;
> k = k / 2;
> }
> j = j + k;
> }
>
> register int le, lei, ip;
> register float pi = 3.1415926, temp;
>
> for (l = 1; l <= m; l++)
> {
> // le = pow (2, l);
> le = 1 << l;
> // is L not 1 !!!!
> lei = le / 2;
> v.real = 1.0;
> v.imag = 0.0;
> w.real = cossp (pi / lei); // REC 0th iteration
> w.imag =-sinsp (pi / lei); // REC 0th iteration
>
> for (j = 1; j <= lei; j++)
> {
> for (i = j; i <= N; i = i + le)
> {
> ip = i + lei ;
> x1 = xin[ip-1];
> x2 = xin[i-1];
> t.real = x1.real * v.real - x1.imag * v.imag;
> t.imag = x1.real * v.imag + x1.imag * v.real;
> xin[ip-1].real = x2.real - t.real;
> xin[ip-1].imag = x2.imag - t.imag;
> xin[i-1].real = x2.real + t.real;
> xin[i-1].imag = x2.imag + t.imag;
> }
>
> // v = ComplexDataMul (v, w);
> temp = v.real * w.real - v.imag * w.imag; // REC
> v.imag = v.real * w.imag + v.imag * w.real; // REC
> v.real = temp; // REC
> }
> }
>
> return;
> }
>
Reply by ●October 23, 20072007-10-23
Hi Diao Yi Ping,
I think I know what went wrong, but first would you try please
the modifications below.
There is a single precision float table of rather a large size,
e.g. for n = 131072 (2^17) the table size is 98304 or
384KBytes, so you need to place it in external memory.
The trick here is not the table itself, it is something
different :)
Here is the code:
#define N 131072 // used only to set max size of the table
#define PIM2 6.283185307179586476925286766559f // = 2 * PI
float table[3*N/4]; // minimum size is 3*largest(n)/4
float *sintbl, *costbl; // pointers to -sin, cos
void set_twiddles (int n, float *table)
{
int i, k;
float fact, arg;
fact = PIM2 / n; // or, better to use = (PIM2 * invn)
// since n is a power of 2, invn
// can be precalculaed exact:
// 2^(-17) = 7.62939453125e-6 or 0x37000000
for (i = n/4; i <= n; i++)
{
k = i - n/4;
arg = fact * i;
table[k] = sinsp (arg);
}
sintbl = table + n/4; // set up pointers to -sin, cos numbers
costbl = table + 0; // the trick is here
}
Before calling the FFT function, set up the twiddles table
by calling the set_table (n, table) function.
Now, in the FFT function, instead of calling -sinsp() and
cossp() calls use the table:
w_re = costbl[bev];
w_im = sintbl[bev];
Let the group know how it worked out.
Rgds,
Andrew
> Date: Thu, 18 Oct 2007 16:40:40 +0800 (CST)
> From: d...@126.com
> To: andrew nesterov , c...
> Subject: Re: Re: q:large size fft ?
>
> hi Andrew and all:
> I change my code as follow:
>
> void ComplexFFT2(COMPLEX*xin,int n)
> {
> register float w_re, w_im, t_re, t_im;
> register int m, g, b;
> register int i, mt, k;
> register float pi=3.1415926;
> register int bev=0;
> register int numbits=1,log2n=n;
> register float x1_re,x1_im,x2_re,x2_im;
> for(numbits=1,log2n>>=1;log2n!=1;numbits++)
> {
> log2n>>=1;
> }
>
> /* for each stage */
> for (m=n; m>=2; m=m>>1)
> {
> /* m = n/2^s; mt = m/2; */
> mt = m >> 1;
> /* for each group of butterfly */
> for (g=0,k=0; g > {
>
> bev=bitrev(k,numbits-1 );
> w_re = cossp(2*pi*bev/n); //REC
> w_im = -sinsp(2*pi*bev/n); //REC
>
> /* for each butterfly */
> for (b=g; b<(g+mt); b++)
> {
> x1_re=xin[b].real;
> x1_im=xin[b].imag;
> x2_re=xin[b+mt].real;
> x2_im=xin[b+mt].imag;
> t_re = w_re * x2_re - w_im * x2_im;
> t_im = w_re * x2_im + w_im * x2_re;
>
> xin[b].real = x1_re + t_re;
> xin[b].imag = x1_im + t_im;
> xin[b+mt].real = x1_re - t_re;
> xin[b+mt].imag = x1_im - t_im;
> }
> }
> }
> BitReverse_complex(xin, n);
>
> return ;
> }
> in new code,there is no recurrence caculation for the twiddle factor,and the
> result is much better,thanks.
> but when the input size is large(128k etc.),the result is still unbearable
> unless I use 'double float' to recode it.
> I think 'float's precision is not enough for the large size fft, the
> accumulation of the trunction error from butterfly effects the final result.
> Do I must use 'double ' if I want to do large size fft and get a high
> precious result ?
> Is there some trics to get result better without using 'double'?
> 2007-10-17"Andrew Nesterov" д
>
> Hi Diao Yi Ping,
>
>> Subject: Re: q:large size fft ?
>> Posted by: "d...@126.com" d...@126.com dyp_and_jmy
>> Date: Mon Oct 15, 2007 4:36 am ((PDT))
>>
>> hi all:
>> thanks for your previous advices,i check my code carefully, and find
>> out some mistakes. now i test my code again with 1K, 16K, 128K (float)
>> theoretical sine wave, attatched is the result.
>>
>> Again,when input size is 1K, the result is correct(no DC, and the frequency
>> value is below 130dB),but when the size is larger, the result is worse, is
>> the phenomenon right or still there is something wrong ?
>
> Yes the phenomenon shows up predictably and yes there is something that causes
> the problem:
>
> Usually, floating point recurrences (look for the lines marked "// REC" in
> your code below) are used when iterations quickly converge. Apparently, in the
> case of trig recurrences, iterations does not converge, hence calculation errors
> accumulate and in the case of large fft size may seriously affect accuracy of
> calculated twiddle factors. I.e. the twiddle factors calculated recurrently may
> significantly deviate from their exact (in the floating point sense) values.
> E.g. for N8K there are 64K iterations of the inner (second) loop at maximum,
> and the same number of recurrences to calculate twiddle factors.
>
> The workaround is quite simple: use a short table of exp(-2*pi*k*l/N) twiddles
> and more elaborate indexing method into this table. There are quite a bit of
> FFT codes on the net that use this method. To begin with, please take a look at
> http://www.jjj.de/fxt/
>
> The term W(k,l) = exp(-2*pi*i*k*l/N) is a periodic function of N, W(k) = W(k+N*m),
> therefore the table needs to keep a number on the order of N values of W().
>
> Rgds,
>
> Andrew
>
>> [...] I attach the code here:
>>
>> typedef struct tagComplexData
>> {
>> float real;
>> float imag;
>> } COMPLEX, *LPCOMPLEX;
>>
>> void ComplexFFT (COMPLEX * xin, int N)
>> {
>> register int f, m, nv2, nm1, i, k, j = 1, l;
>> /* int f, m, nv2, nm1, i, k, j = N/2, l; */
>> register COMPLEX v, w, t, x1, x2;
>> nv2=N/2 ;
>> f=N ;
>>
>> for (m = 1, f >>= 1; f != 1; m++)
>> {
>> f>>=1;
>> }
>>
>> nm1=N-1 ;
>>
>> //bit reverse/
>> for (i = 1; i <= nm1; i++)
>> {
>> if(i < j)
>> {
>> t = xin[j-1];
>> xin[j-1] = xin[i-1];
>> xin[i-1] = t;
>> }
>> k = nv2;
>> while (k < j)
>> {
>> j = j - k;
>> k = k / 2;
>> }
>> j = j + k;
>> }
>>
>> register int le, lei, ip;
>> register float pi = 3.1415926, temp;
>>
>> for (l = 1; l <= m; l++)
>> {
>> // le = pow (2, l);
>> le = 1 << l;
>> // is L not 1 !!!!
>> lei = le / 2;
>> v.real = 1.0;
>> v.imag = 0.0;
>> w.real = cossp (pi / lei); // REC 0th iteration
>> w.imag =-sinsp (pi / lei); // REC 0th iteration
>>
>> for (j = 1; j <= lei; j++)
>> {
>> for (i = j; i <= N; i = i + le)
>> {
>> ip = i + lei ;
>> x1 = xin[ip-1];
>> x2 = xin[i-1];
>> t.real = x1.real * v.real - x1.imag * v.imag;
>> t.imag = x1.real * v.imag + x1.imag * v.real;
>> xin[ip-1].real = x2.real - t.real;
>> xin[ip-1].imag = x2.imag - t.imag;
>> xin[i-1].real = x2.real + t.real;
>> xin[i-1].imag = x2.imag + t.imag;
>> }
>>
>> // v = ComplexDataMul (v, w);
>> temp = v.real * w.real - v.imag * w.imag; // REC
>> v.imag = v.real * w.imag + v.imag * w.real; // REC
>> v.real = temp; // REC
>> }
>> }
>>
>> return;
>> }
>> ȫ ְ Ů Ů 2 0 0 7
I think I know what went wrong, but first would you try please
the modifications below.
There is a single precision float table of rather a large size,
e.g. for n = 131072 (2^17) the table size is 98304 or
384KBytes, so you need to place it in external memory.
The trick here is not the table itself, it is something
different :)
Here is the code:
#define N 131072 // used only to set max size of the table
#define PIM2 6.283185307179586476925286766559f // = 2 * PI
float table[3*N/4]; // minimum size is 3*largest(n)/4
float *sintbl, *costbl; // pointers to -sin, cos
void set_twiddles (int n, float *table)
{
int i, k;
float fact, arg;
fact = PIM2 / n; // or, better to use = (PIM2 * invn)
// since n is a power of 2, invn
// can be precalculaed exact:
// 2^(-17) = 7.62939453125e-6 or 0x37000000
for (i = n/4; i <= n; i++)
{
k = i - n/4;
arg = fact * i;
table[k] = sinsp (arg);
}
sintbl = table + n/4; // set up pointers to -sin, cos numbers
costbl = table + 0; // the trick is here
}
Before calling the FFT function, set up the twiddles table
by calling the set_table (n, table) function.
Now, in the FFT function, instead of calling -sinsp() and
cossp() calls use the table:
w_re = costbl[bev];
w_im = sintbl[bev];
Let the group know how it worked out.
Rgds,
Andrew
> Date: Thu, 18 Oct 2007 16:40:40 +0800 (CST)
> From: d...@126.com
> To: andrew nesterov , c...
> Subject: Re: Re: q:large size fft ?
>
> hi Andrew and all:
> I change my code as follow:
>
> void ComplexFFT2(COMPLEX*xin,int n)
> {
> register float w_re, w_im, t_re, t_im;
> register int m, g, b;
> register int i, mt, k;
> register float pi=3.1415926;
> register int bev=0;
> register int numbits=1,log2n=n;
> register float x1_re,x1_im,x2_re,x2_im;
> for(numbits=1,log2n>>=1;log2n!=1;numbits++)
> {
> log2n>>=1;
> }
>
> /* for each stage */
> for (m=n; m>=2; m=m>>1)
> {
> /* m = n/2^s; mt = m/2; */
> mt = m >> 1;
> /* for each group of butterfly */
> for (g=0,k=0; g > {
>
> bev=bitrev(k,numbits-1 );
> w_re = cossp(2*pi*bev/n); //REC
> w_im = -sinsp(2*pi*bev/n); //REC
>
> /* for each butterfly */
> for (b=g; b<(g+mt); b++)
> {
> x1_re=xin[b].real;
> x1_im=xin[b].imag;
> x2_re=xin[b+mt].real;
> x2_im=xin[b+mt].imag;
> t_re = w_re * x2_re - w_im * x2_im;
> t_im = w_re * x2_im + w_im * x2_re;
>
> xin[b].real = x1_re + t_re;
> xin[b].imag = x1_im + t_im;
> xin[b+mt].real = x1_re - t_re;
> xin[b+mt].imag = x1_im - t_im;
> }
> }
> }
> BitReverse_complex(xin, n);
>
> return ;
> }
> in new code,there is no recurrence caculation for the twiddle factor,and the
> result is much better,thanks.
> but when the input size is large(128k etc.),the result is still unbearable
> unless I use 'double float' to recode it.
> I think 'float's precision is not enough for the large size fft, the
> accumulation of the trunction error from butterfly effects the final result.
> Do I must use 'double ' if I want to do large size fft and get a high
> precious result ?
> Is there some trics to get result better without using 'double'?
> 2007-10-17"Andrew Nesterov" д
>
> Hi Diao Yi Ping,
>
>> Subject: Re: q:large size fft ?
>> Posted by: "d...@126.com" d...@126.com dyp_and_jmy
>> Date: Mon Oct 15, 2007 4:36 am ((PDT))
>>
>> hi all:
>> thanks for your previous advices,i check my code carefully, and find
>> out some mistakes. now i test my code again with 1K, 16K, 128K (float)
>> theoretical sine wave, attatched is the result.
>>
>> Again,when input size is 1K, the result is correct(no DC, and the frequency
>> value is below 130dB),but when the size is larger, the result is worse, is
>> the phenomenon right or still there is something wrong ?
>
> Yes the phenomenon shows up predictably and yes there is something that causes
> the problem:
>
> Usually, floating point recurrences (look for the lines marked "// REC" in
> your code below) are used when iterations quickly converge. Apparently, in the
> case of trig recurrences, iterations does not converge, hence calculation errors
> accumulate and in the case of large fft size may seriously affect accuracy of
> calculated twiddle factors. I.e. the twiddle factors calculated recurrently may
> significantly deviate from their exact (in the floating point sense) values.
> E.g. for N8K there are 64K iterations of the inner (second) loop at maximum,
> and the same number of recurrences to calculate twiddle factors.
>
> The workaround is quite simple: use a short table of exp(-2*pi*k*l/N) twiddles
> and more elaborate indexing method into this table. There are quite a bit of
> FFT codes on the net that use this method. To begin with, please take a look at
> http://www.jjj.de/fxt/
>
> The term W(k,l) = exp(-2*pi*i*k*l/N) is a periodic function of N, W(k) = W(k+N*m),
> therefore the table needs to keep a number on the order of N values of W().
>
> Rgds,
>
> Andrew
>
>> [...] I attach the code here:
>>
>> typedef struct tagComplexData
>> {
>> float real;
>> float imag;
>> } COMPLEX, *LPCOMPLEX;
>>
>> void ComplexFFT (COMPLEX * xin, int N)
>> {
>> register int f, m, nv2, nm1, i, k, j = 1, l;
>> /* int f, m, nv2, nm1, i, k, j = N/2, l; */
>> register COMPLEX v, w, t, x1, x2;
>> nv2=N/2 ;
>> f=N ;
>>
>> for (m = 1, f >>= 1; f != 1; m++)
>> {
>> f>>=1;
>> }
>>
>> nm1=N-1 ;
>>
>> //bit reverse/
>> for (i = 1; i <= nm1; i++)
>> {
>> if(i < j)
>> {
>> t = xin[j-1];
>> xin[j-1] = xin[i-1];
>> xin[i-1] = t;
>> }
>> k = nv2;
>> while (k < j)
>> {
>> j = j - k;
>> k = k / 2;
>> }
>> j = j + k;
>> }
>>
>> register int le, lei, ip;
>> register float pi = 3.1415926, temp;
>>
>> for (l = 1; l <= m; l++)
>> {
>> // le = pow (2, l);
>> le = 1 << l;
>> // is L not 1 !!!!
>> lei = le / 2;
>> v.real = 1.0;
>> v.imag = 0.0;
>> w.real = cossp (pi / lei); // REC 0th iteration
>> w.imag =-sinsp (pi / lei); // REC 0th iteration
>>
>> for (j = 1; j <= lei; j++)
>> {
>> for (i = j; i <= N; i = i + le)
>> {
>> ip = i + lei ;
>> x1 = xin[ip-1];
>> x2 = xin[i-1];
>> t.real = x1.real * v.real - x1.imag * v.imag;
>> t.imag = x1.real * v.imag + x1.imag * v.real;
>> xin[ip-1].real = x2.real - t.real;
>> xin[ip-1].imag = x2.imag - t.imag;
>> xin[i-1].real = x2.real + t.real;
>> xin[i-1].imag = x2.imag + t.imag;
>> }
>>
>> // v = ComplexDataMul (v, w);
>> temp = v.real * w.real - v.imag * w.imag; // REC
>> v.imag = v.real * w.imag + v.imag * w.real; // REC
>> v.real = temp; // REC
>> }
>> }
>>
>> return;
>> }
>> ȫ ְ Ů Ů 2 0 0 7
Reply by ●October 23, 20072007-10-23






