Extend FFT spectrum question

Started by samecues 2 years ago6 replieslatest reply 2 years ago203 views

Hi all!

I have some strange question here. Imagine that I have this WOLA scenario:

Signal -> SineWindow -> FFT -> Process -> IFFT -> SineWindow and overlap

Ok, all is fine and it works as expected (I made this for noise reduction).

My question is: It is possible to copy part of spectrum to another part? Imagine a MP3, and copying the 11k to 15k to 16 to 20k (extending it). The top is 22050Hz. I made this some time ago and test it visualizing strange results. Any advise is welcome, just for fun.

Thanks and best regards!

[ - ]
Reply by dudelsoundFebruary 19, 2022

Hi - MP3 drops all information above 16kHz - so you just don't have it anymore. Now, you are trying to add something (and using 11-15k duplicated is just one of many possible choices) - hoping the result will sounds similar to what has been there before MP3 removed it.

There is no way you can reliably reconstruct the upper frequencies, but there are MANY ways to create high frequency content that sounds PLAUSIBLE - so your question is really more about psychoacoustic plausibility than physical feasibility.

Audio effects that do this sort of thing are called "exciter" - you might want to google that.

Duplicating a spectrum of lower frequencies is one way to do it, but it will break up regular harmonic patterns if there are any - but then the human ear is very insensitive to pitch at high frequencies so it might still work.

Another way: You could extract some sort of envelope of the spectrum and guess how it extends to higher frequencies (e.g. you see a rise in amplitude from low towards high frequencies -> assume there will be much energy in the missing bins or vice versa). Then you can fill the high bins with noise and multiply with the envelope

The ear is rather insensitive to phase and pitch at these frequencies (>15kHz), so I would probably go for something noise-based rather than spectrum copies.

The brute force way to do it would be to train a huge neural network with the MP3-compressed spectra as input and the original spectra as a reconstruction target and let the network find plausible extensions.

[ - ]
Reply by samecuesFebruary 19, 2022

Hi dudelsound!

Thank you for your answer and for your suggestions. Yes, that's the scenario with MP3. I had not heard about exciters, and your envelope approach sounds good. I will test It with OLA and linear convolution, perhaps WOLA is not suitable for this approach.

[ - ]
Reply by fharrisFebruary 19, 2022


what is the sample rate? How did you pick up and paste spectral span to the offset span. What is the window used in the transform? How much overlapped of the windowed intervals?

fred h 

[ - ]
Reply by samecuesFebruary 19, 2022

Hi fharris!

- Samplerate is 44100Hz. Audio Signal. I pick and paste bins directly from the 11Khz - 15Khz to 16Khz - 20Khz. Maybe the issue is phase?

- Window is the Sine Window (sqrt of Hann Window)

- 50 percent overlap

[ - ]
Reply by ing_jpuFebruary 19, 2022

More than the feasibility, I suppose your question is related with the meaning of the resulting "strange" signal". If the the Nyquist frequency is 22.050 kHz and assuming that the copy is in the intervals 16kHz to 20kHz and 24.1kHz to 28.1kHz (otherwise the ifft returns a complex signal), high frequency content is been added to the signal. The ifft waveform will depend on the energy content in the copied 11kHz to 15kHz interval. Whether the WOLA process would attenuate such components may depends on the length of the window and if some periodicity is captured. Strange answers to strange questions...

[ - ]
Reply by samecuesFebruary 19, 2022

Hi ing_jpu!

The spectrum has a range of 0 to 22050Hz. The original signal has a cutoff at 15000Hz. My question is to copy a portion of this spectrum (11000 Hz to 15000) into the zeroed portion of 16000Hz to 20000Hz. This came to a signal that is equal but with info added at 16000Hz to 20000Hz (not at 24100Hz to 28100Hz). My question is if that is possible to do. In my test, it sounds ok but the resulting spectrum had these added amplitudes very lower than that of the original portion.