ARM CMSIS FFT - does it produce correct results?

Started by mjbcswitzerland 7 years ago15 replieslatest reply 5 years ago6341 views

Hi All

Does anyone have experience with the #ARM CMSIS #FFT? Specifically the floating point implementation arm_cfft_f32()?

Although testing the code using the ARM reference signal results in it passing the test (a noise signal with a component that it has to recognise as largest value) the actual frequency components of 'real' signals have what looks like large distortions.

During the last week or so I have tested the code on various system (pure PC environment, Cortex M4 with and without FPU) and in each case I get identical results (passes the test but real signals are inaccurate).

After some peer reviews where we checked all electrical tests (when using ADC and real signals) and also tested the simulated inputs (when generating signals internally for example), and also compared results using the same inputs with Matlab the result remains the same - the ARM CMSIS output has deviations.

After asking for help at NXP (the processor used for HW tests) and mbed (ARM) and getting no response I am hoping that a more specialist group may have a better chance. Therefore, these are the two posts filled hopefully with complete information to understand the situation.

Although the result does include the expected signals (looks not that bad at first glance) it must be close, but the large distortion signal in the output is something that can't be explained yet. The questions is whether we are making an error in interpreting the output and it is in fact usable - if one knows how - or whether we need to find a different solution for pressing project requirements.... In the worst case, what library would people actually use for embedded projects with smallish processors?



[ - ]
Reply by Rick LyonsSeptember 10, 2018

Hi.  Try these tests:

Test# 1: Fill your input buffer with 512 samples with each sample having an amplitude of +1. Your magnitude spectrum should show energy (magnitude = 512) at zero Hz and no spectral energy at any other frequency.

Test# 2: Fill your input buffer with a single sample having an amplitude of +1 and the next 511 samples equal to zero. Your magnitude spectrum should show energy (magnitude = 512) at zero Hz and no spectral energy at any other frequency. Your magnitude spectrum should show a magnitude = 1 at all frequencies.(You spectral magnitude curve should be a flat line.)

Test# 3: Fill your input buffer with six samples whose amplitudes are all +1 followed by 506 samples whose amplitudes are all zero. Your magnitude spectrum should show a classic |sin(x)/x| spectral envelope with the DC, zero Hz, spectral component showing a magnitude of six.

Test# 4: Fill your input buffer with *exactly* three cycles of a sine wave whose peak value is +3. (Your first and last input samples should NOT have the same amplitude!)  Your magnitude spectrum should show a sequence of 0,0,0,768,0,0,0,0,0,0,0,0,0,0,0,... .

If your FFT system gives you the expected results for those four tests, then that's a moderately good sign that your FFT processing is working properly.

Let us know how that testing went for you.

[ - ]
Reply by mjbcswitzerlandSeptember 10, 2018


Many thanks for the very wise advice. I have just performed the 4 tests.



I get 360 rather than 512 but only DC.



This test is as expected.



This starts off well but is not really a classic sin(x)/x



This one shows the behavior that we have been seeing. It starts 0.01,0.2,0.7,545...

but finishes
0.56, 1, 566, 0.7, 0.23

Unfortunately the sign that this gives is a little less than moderately good (?)


[ - ]
Reply by drmikeSeptember 10, 2018


What is the data format in and out of the call?  It looks like a factor of 2 in a pointer is wrong.  If not, then the FFT routine has a major bug.  Rick's test should be all zeros after the 768 in bin 3.  The sync result is also mirroring about bin 128, and that is not right.  The comparison between Matlab and this also says there is a bug.

The question is - is it your bug, or the subroutine?  Is their complex format different than what most of us use?  Is there a funny offset they have that moves the data so your pointers are not where they expect?  I have been bitten by lots of really stupid errors where an "off by 1" turns a simple thing into a randomized freak show.  If you have everything right, abandon that subroutine and look for something else.


[ - ]
Reply by mjbcswitzerlandSeptember 10, 2018


I am using single-precision floating point (the routine used is for that). Since I have an FPU in the final processor this is the best choice in my case rather than using the alternative integer (q15 etc.) variations.

This is the complete test code taking Rick's sin(x)/x expected output since it is easy to prepare (hardly likely to be a programming error) and the output easily recognisable.

float32_t testInput_f32_10khz[512] = {0};
testInput_f32_10khz[0] = 1.0;
testInput_f32_10khz[1] = 1.0;
testInput_f32_10khz[2] = 1.0;
testInput_f32_10khz[3] = 1.0;
testInput_f32_10khz[4] = 1.0;
testInput_f32_10khz[5] = 1.0; // we now have 6 x 1.0 followed by 0.0 in the 512 byte input buffer
arm_cfft_f32(&arm_cfft_sR_f32_len256, testInput_f32_10khz, 0, 1); // 256 point FFT from 512 sample input buffer
arm_cmplx_mag_f32(testInput_f32_10khz, testInput_f32_10khz, fftSize);// process the data through the Complex Magnitude Module for calculating the magnitude at each bin
// Now plot out the first 256 floats in the buffer
// - we expect a sin(x)/x display

The only thing that I am doing is preparing a 512 float input buffer, filling it with 0 and then setting the first 6 sample to 1.0. Then there are just the 2 standard ARM CMSIS DSP library calls.

To avoid doubt that the display is wrong I stop the debugger after the calls have been performed and copy the content of the buffer (first 256 floats) into excel to display the graph. I can't remove any more of my own code that could be the source of bugs because it is absolute minimum now.

Originally I was feeding the signal in via USB audio and displaying it on a TFT display in real time. It actually looked quite good since playing music gave a nice graphical equaliser effect and it was only when starting testing sine wave sweeps that it became obvious that something wasn't right. This is where it was assumed that the input sampling must be wrong, the input buffer preparation must have bugs or the output display must be off - but I can't reduce it any more than this and in any test mode (with any input or output source) the results are identical(ly bad).

I wish dearly that I could find an error being made in its use because I basically can't believe the library which is supposed to be the defacto standard for ARM processors since about 5 years doesn't work (and I still can't). There must be an explanation so that I can sleep at night again....

However I have a deadline of Tuesday next week. If the reason can't be identified we will have to drop CMSIS and start fresh with something else (and hope that that works off-the-bat).

Until then I am however not giving up and if there is no indication of usage error I will even get my text book out and see whether I can understand the code being executed (in the 16 point case of course, where the effect is identical).



[ - ]
Reply by Tim WescottSeptember 10, 2018

If the vector you're handing to the FFT routine is expected to be complex, and you're thinking a vector of reals, then it would explain much if not most of the problem.

Your result from the 6 ones, 250 zeros is particularly telling -- it looks like what you'd get from three ones, right down to the magnitude.

Go spelunking in the documentation, maybe?

[ - ]
Reply by mjbcswitzerlandSeptember 10, 2018


I think that you have hit the nail on the head. arm_cfft_f32() looks to really need complex input and doesn't just give complex output. Reading the documents again it isn't clear at all that the input is complex since the understanding is that it is referring to a complex (as expected).

If I add a 0.0 between every input sample things are beginning to make sense. The shapes are symmetrical but the 256 point FFT is only actually giving 128 bins - the second set is mirrored.

Now I see that there is also a function called arm_rfft_f32()
that specifies that it accepts real inputs. Since I am really tight on RAM the complex one probably can't be used to get the frequency resolution needed using a complex input buffer (of twice the foreseen size). Reading the description to this version it points out that complex versions expect complex inputs, which is a shame because it would have been clear if this one had been read before....

Therefore I will try swapping in the real version to see what it does. Once I understand the differences I'll give a conclusion.



[ - ]
Reply by mukulSeptember 10, 2018

Hi Mark,

Does arm_rfft_f32 function work for you?

Is it possible to get only +ve frequencies in output buffer with this function as FFT of a real signal will be mirror and it will be a waste of compute and memory (double of complex output buffer size) for generating -ve frequencies.

For N-floating point real input buffer what is the output size of buffer (considering output will be complex as well there may or may not be -ve frequencies computed).

I could not get these details from the documentation.



[ - ]
Reply by Tim WescottSeptember 10, 2018

I think all the suggestions so far are good ones.  I would add to Rick's tests, though: once you've done his, do a sweep from frequency 1/N to 1/2 in steps of 1/N (in other words, all of the discrete frequencies that have an integer number of cycles).  For the most part (except for f = 1/N) your results should be identical to what Rick suggests, but with the non-zero bin corresponding to the frequency.

This is good to know, by the way.  I hope that it's not a real problem, but if I have occasion to use the code I'll know to check it before I let a project's results hinge on its performance.

[ - ]
Reply by doctekSeptember 10, 2018

Others have pointed at this, but let me make a clear statement and quote from the mbed documentation for the function: 

The input data is complex and contains 2*fftLen interleaved values as shown below. {real[0], imag[0], real[1], imag[1],..} 

The FFT result will be contained in the same array and the frequency domain values will have the same interleaving. 

If you're not putting the data in in the correct format, you won't get the results you expect. Try fixing your input data as indicated and see what result you get.

[ - ]
Reply by Tim WescottSeptember 10, 2018

That's what I suspected.  I was just too lazy to go RTFM (free advise, get what you pay for, etc.).  I'm not sure that it would cause absolutely ALL the problems that the OP is seeing, but it would certainly cause SOME -- and if he's populating 256 items in an array and then doing a 256-point FFT, that leaves 256 uninitialized values that he's operating on.

[ - ]
Reply by mjbcswitzerlandSeptember 10, 2018

Hi All

I decided not to use the RFFT and stick with the CFFT one - just constructing the required complex data at the input.
This requires the additional copy and extra storage space but is Ok for the first utilisation and I want to later revisit where the RFFT may be able to simplify a little (and also look at q31 and q15 options rather than floating point).

Here is the CFFT in action:

Thanks and regards


[ - ]
Reply by drmikeSeptember 10, 2018

I'm not familiar with the arm version, but looking at your plots my first guess is 7 + 17 = 24 which implies the 17 is really -7.  Look at how you are doing the magnitude calculation, it may be both those peaks should be added into 1 bin.  

There are lots of other things that can be going on though, that's just the first thing that is curious.


[ - ]
Reply by mjbcswitzerlandSeptember 10, 2018

Hi Mike

The magnitude calculation is arm_cmplx_mag_f32(fft_input_buffer, fft_magnitude_buffer, 256);, however it is only doing a root-mean-square of the imaginary and real components in each bin. Therefore it doesn't change the location of the bins in any way.

I also did my own calculation to be sure and it did the same.

The final parameter passed to the FFT is whether it should do bit reversal or not.

arm_cfft_f32(&arm_cfft_sR_f32_len256, fft_input_buffer, 0, 1);

With it set to 0 the bins are all mixed up (of course). Since ARM changed the bit reversal code between CMSIS DSP V3 and V5, where the bit reversal code was changed to an assembler implementation I was wondering whether the bit reversal was going wrong somehow. I rewrote the routine in C code to see what it does and it gives identical results.

The parameter 0 is telling it not to do an inverse transform.

If I do a frequency sweep from 0..24kHz I see the expected component moving from 0..8kHz. At the same time I see a component moving from 24kHz down. At 8kHz the component that started at 24kHz is 3x higher in amplitude than the expected on. Then as the sweep continues the expected one actually drops in amplitude and is quite small at 10kHz and the one that is coming down from 24kHz remains strong. They cross at 12kHz and the expected one continues up towards 24kHz, the unexpected one continued down towards 0. The one coming down is always stronger than the expected one.

The last revision of the FFT routine was 19. March 2015 I am am assuming that there must be thousands of applications using it. We have studied the ARM reference manuals and find nothing that mentions that there is any strange ordering of results that needs to be considered. There seems to be no mention of problems with it in the Internet and we can't understand that we can't get a reasonable response to a simple sine wave input using a single library call (on multiple HW platforms).

Here's another example where we have taken a simple 16 point FFT with a windowed signal (32 samples of a simple test single).

The same 32 input samples into the same FFT in Matlab gives:

There are certain similarities and their is a feeling that something is inverted but no attempts to work out what needs to be moved around has been successful yet!



P.S. Images went lost when posing so added info here:

[ - ]
Reply by grogerSeptember 10, 2018

I agree that there's something bizarre with it. Try a sampling frequency that is a power of 2.

It seemed that when I tried using it, this was the case. I could only get 1024, 2048, 4096..etc... to produce correct bin results.

[ - ]
Reply by mjbcswitzerlandSeptember 10, 2018


There is no sampling frequency involved since the tests are just in static buffers.