How to decimate/nterpolate in frequency domain and recover a proper causal time signal ?

Started by November 20, 2006
Hi everyone !

I'm new in this group and quite intermediate in DSP applications. I'm
facing the following problem in the framework of my study.

I've got some 4 seconds recorded sounds (44100Hz, 16bits) for which I'd
like to modify the FFT spectrum. Actually, I'd like to replace at a
constant rate (e.g. 3 values every four values, from the DC component)
some initial values with "new" values issued from the linear
interpolation between the remaining values, and get back the
corresponding modified sound, thus keeping the same sampling frequency,
i.e. 44100Hz.
I've thought about the following steps for the process :
- first, applying FFT to get the original spectrum.
- then, retaining 1 value every 4 values from sample 0 to
sample N/2, amounting in a way to frequency decimation,
- Interpolating linearly the real and imaginary parts of the spectrum
to recover values at the decimated frequency bins.
- reconstituting the complete spectrum [0-Fe[ by flipping complex
numbers 1 to N/2 of the vector obtained at the previous step, taking
their conjugate and concatenating them with the above vector,
- finally, recompute the time signal by processing an ifft on the
previous sequence and keeping the real part of the result.

I've tried this process but it seems it doesn't work : the obtained
time signal looks really distorted, probably owing to a phenomenon of
time aliasing.
especially in the time domain (and problems of frequency aliasing), but
I've found nothing about equivalent processes in the frequency domain.
I'm really getting stuck on this problem; can anyone help me to detect
where my reasonning is erroneous and indicate me how to process in a
proper way to obtain the expected result ?

Scoubi.

scoubi wrote:
> Hi everyone ! > > I'm new in this group and quite intermediate in DSP applications. I'm > facing the following problem in the framework of my study. > > I've got some 4 seconds recorded sounds (44100Hz, 16bits) for which I'd > like to modify the FFT spectrum. Actually, I'd like to replace at a > constant rate (e.g. 3 values every four values, from the DC component) > some initial values with "new" values issued from the linear > interpolation between the remaining values, and get back the > corresponding modified sound, thus keeping the same sampling frequency, > i.e. 44100Hz. > I've thought about the following steps for the process : > - first, applying FFT to get the original spectrum. > - then, retaining 1 value every 4 values from sample 0 to > sample N/2, amounting in a way to frequency decimation, > - Interpolating linearly the real and imaginary parts of the spectrum > to recover values at the decimated frequency bins. > - reconstituting the complete spectrum [0-Fe[ by flipping complex > numbers 1 to N/2 of the vector obtained at the previous step, taking > their conjugate and concatenating them with the above vector, > - finally, recompute the time signal by processing an ifft on the > previous sequence and keeping the real part of the result. > > I've tried this process but it seems it doesn't work : the obtained > time signal looks really distorted, probably owing to a phenomenon of > time aliasing. > I've read in the litterature about decimation / interpolation > especially in the time domain (and problems of frequency aliasing), but > I've found nothing about equivalent processes in the frequency domain. > I'm really getting stuck on this problem; can anyone help me to detect > where my reasonning is erroneous and indicate me how to process in a > proper way to obtain the expected result ? > > Thanks in Advance, > > Scoubi.
It would help if you would state what the expected result is. What are you trying to do? John

> > Scoubi. > > It would help if you would state what the expected result is. What are > you trying to do? > > John
Sorry, The expected result is to obtain by IFFT a "properly" modified sound, reflecting the mentioned modifications processed on the FFT spectum. By "properly", I mean I had expected to get a sound distorted because of these modifications, keeping nevertheless the structure of the original sound. But, when testing on a musical sound 4s in length, the modified sound I got has lost the structure of the original sound (time aliasing-like phenomena apparently occur); the latter doesn't mean anything at all, the temporal wrapping of the signal looks like a sinc shape whose period seems to be especially linked to the decimation factor. Does this help ? If not, don't hesitate to tell me. Thanks, Scoubi.
> The expected result is to obtain by IFFT a "properly" modified sound, > reflecting the mentioned modifications processed on the FFT spectum. By > "properly", I mean I had expected to get a sound distorted because of > these modifications, keeping nevertheless the structure of the original > sound. But, when testing on a musical sound 4s in length, the modified > sound I got has lost the structure of the original sound (time > aliasing-like phenomena apparently occur); the latter doesn't mean > anything at all, the temporal wrapping of the signal looks like a sinc > shape whose period seems to be especially linked to the decimation > factor.
It sounds like you may not be using enough frequency samples when doing your IFFT. Do you perhaps have a programming error that is preventing you from using the frequency samples that you interpolated? It sounds as if your reconstruction of the signal is using fewer input points to the IFFT than you got out of your original FFT. The dual of the sampling theorem here applies; just as sampling too infrequently in the time domain gives you aliasing in the frequency domain, sampling too infrequently in the frequency domain gives you aliasing in the time domain. Jason
scoubi wrote:
> >>> Scoubi. >> It would help if you would state what the expected result is. What are >> you trying to do? >> >> John > > Sorry, > > The expected result is to obtain by IFFT a "properly" modified sound, > reflecting the mentioned modifications processed on the FFT spectum. By > "properly", I mean I had expected to get a sound distorted because of > these modifications, keeping nevertheless the structure of the original > sound. But, when testing on a musical sound 4s in length, the modified > sound I got has lost the structure of the original sound (time > aliasing-like phenomena apparently occur); the latter doesn't mean > anything at all, the temporal wrapping of the signal looks like a sinc > shape whose period seems to be especially linked to the decimation > factor. > > Does this help ? > If not, don't hesitate to tell me. > > Thanks, > Scoubi. >
Time domain decimation causes squeezing and replication in the frequency domain. If the original signal was sufficiently oversampled, one can use a LPF to eliminate the frequency domain replicas and clean up the time signal. If it wasn't oversampled enough, the signal can't be cleaned up because the replicas have overlapped each other. Frequency domain decimation causes squeezing and replication in the time domain. Your bin interpolator is the dual of the LPF mentioned above. It seems to me that the questions to ask are do the time replicas overlap is the filter good enough. Using a bigger FFT with some zero padding might be necessary. John
> It sounds like you may not be using enough frequency samples when doing > your IFFT. Do you perhaps have a programming error that is preventing > you from using the frequency samples that you interpolated? It sounds > as if your reconstruction of the signal is using fewer input points to > the IFFT than you got out of your original FFT. The dual of the > sampling theorem here applies; just as sampling too infrequently in the > time domain gives you aliasing in the frequency domain, sampling too > infrequently in the frequency domain gives you aliasing in the time > domain. > > Jason
Hi jason, Thanks for your reply. I 've just had a look to the matlab routine I used for the process
> It sounds like you may not be using enough frequency samples when doing > your IFFT.
Yet, before running the iFFT, the reconstructed complex spectrum has the same number of bins (i.e. 176400) as the original one. I don't know if the matter really stems from that fact; can you confirm me this at the sight of the file .m mentioned below? Thanks in advance. Scoubi. -------------------------------------------- file .m %%Reading of the original stereo music sample Y=wavread('OriginalSample.wav'); %%FFT Calculation FFTY=fft(Y); %%Constant values %Sampling Frequency fs=44100; %Time duration length=4; %Number of bins N=length*fs; %Frequency vector (frequencies until Nyquist frequency, included) vecfnyquist=[0:1/length:N/(2*length)]; %%Spectrum decimation (for instance 1 value retained every four values, from DC component) FFTYdecim=FFTY(1:4:N/2+1,:); ReFFTYdecim=real(FFTYdecim); ImFFTYdecim=imag(FFTYdecim); %%Linear interpolation of real and imaginary parts of the spectrum within the range %%[0;Fnyquist] %%(i.e. replacement of the decimated values with the values issued from the linear interp. %%processed between the remaining values) for i=1:2; ReFFTYreshape(:,i)=interp1(vecfnyquist(1:4:end),ReFFTYdecim(:,i),vecfnyquist); ImFFTYreshape(:,i)=interp1(vecfnyquist(1:4:end),ImFFTYdecim(:,i),vecfnyquist); end; %%Reconstruction of the modified complex spectrum within the range [0;Fnyquist] from its %%real and imaginary parts FFTYreshape=ReFFTYreshape+i*ImFFTYreshape; %%Reconstruction of the entire complex spectrum within the range [0;fs[ FFTYcomplete=cat(1,FFTYreshape,fliplr(conj(FFTYreshape(2:end-1,:))')') %%Calculation of the corresponding modified signal by IFFT Ymodif=real(ifft(FFTYcomplete)); %.Wav writing of the modified signal wavwrite(Ymodif/(1.001*max(max(Ymodif))),fs,16,'ModifiedSample.wav');
John Sampson a =E9crit :

> Time domain decimation causes squeezing and replication in the frequency > domain. If the original signal was sufficiently oversampled, one can use > a LPF to eliminate the frequency domain replicas and clean up the time > signal. If it wasn't oversampled enough, the signal can't be cleaned up > because the replicas have overlapped each other. > > Frequency domain decimation causes squeezing and replication in the time > domain. Your bin interpolator is the dual of the LPF mentioned above. It > seems to me that the questions to ask are do the time replicas overlap > is the filter good enough. Using a bigger FFT with some zero padding > might be necessary. > > John
Thanks for you reply John, I'm not sure to have understood what you meant by :
> It seems to me that the questions to ask are do the time replicas overlap > is the filter good enough
Can you give me some precisions ? Nevertheless, I've followed your advice, i.e. zero-padding the FFT Intuitively, I've added a number of zeros equal to three times the initial number of bins (3*176400), for a decimation factor equal to 4. I have implemented it in the M file (mentioned in my previous post) in the following way: FFTYcomplete=3Dcat(1,FFTYreshape,fliplr(conj(FFTYreshape(2:end-1,:))')',zer= os(529200,2)) but it seems the problem remains: the signal is longer, but it is still wrapped in a sinc-like shape. Do you see any other mistake ? Scoubi.
anything wrong. What you're doing (decimating in the frequency domain
followed by linear interpolation between the samples) is equivalent to
doing the following:

1. Setting 3 out of every 4 frequency samples to zero, but leaving them
in the vector (this differentiates this operation from decimation,
which would actually throw the samples away). This is actually
identical to a decimation step followed by an upsampling step (I'm
using upsampling to refer to the initial step in interpolation,
inserting extra zeros as needed). This performs the frequency domain
expansion/contraction operations that John talked about, so your
sampling rate could become an issue here.
2. Convolving the resulting vector with a triangle function; this is
the actual interpolation. If you don't see how this works, draw it out
and you'll see.

Therefore, convolution of the frequency domain samples with the
triangle function yields a multiplication in the time domain between
your original signal and the inverse transform of the triangle
function, which happens to be a sinc^2. That's what gives you the
envelope that you see, which, as you stated previously, is related to
the decimation factor (the decimation factor affects the length of the
triangle function, which affects the mainlobe width of the sinc).

What exactly are you trying to accomplish?

Jason

> What exactly are you trying to accomplish?
Actually, I'm trying to make a parallel between the numerical and experimental parts of my work that deals with the sound quality of structures. In short, I'm going in a future stage to calculate with a vibro-acoustic tool (in the frequency domain) some sounds radiated by a structure, from the knowledge of its vibratory velocities. But, owing to computation limitations, I will not be able to calculate the frequency responses (complex spectra of the noises) with a fine frequency step (at best maybe 1 or 2 Hz), I will then have to linearly interpolate the spectra to create additional "fictive" values and then get a sufficient number of values (i.e. 176400) so that, when going back to the time domain by IFFT, I can get a sound sample 4s in length of quite good quality, i.e. involving a sampling rate compatible (i.e. 44100 Hz) with the purpose of subjective evaluation I wanna run. Thus, from the radiated sounds I have already recorded experimentally (the same as the ones I will calculate), I'd like to collect some useful information about the value of the maximum frequency step I could later use in simulations without significantly modyfing the perceptual trends I would get from an evaluation of the recorded noises, modified following the procedure we are discussing (and thus involving a first stage of "decimation" to simulate the future coarse frequency step used under the vibro-acoustic tool). Sorry if I have dwelt too much, it may be a little boring or confuse ! :-)) But this is the why of the how :-) Otherwise, I've started to implement what you mentioned in your post, I think you hold the good way, you exactly described what I had observed. Could you just advise me about some elements with which I do not feel very comfortable, so that I can converge more quickly to the solution. Notably:
>Convolving the resulting vector with a triangle function; this is > the actual interpolation. If you don't see how this works, draw it out > and you'll see.
I don't manage to obtain the results of the interpolation I previously observed, when now using the convolution of the "decimated" spectrum (in the way you specified) with the triangular function, what should be the parameters of the triangle function (width, delay, sequence length...) so that it could work, e.g. for a decimation factor equal to 4 ? Should I perform the convolution for the whole spectrum (i.e. [0;Fe[) or the partial spectrum ([0; Fe/2], then reconstruction), on the complex values or separately on real and imaginary parts ?
> Therefore, convolution of the frequency domain samples with the > triangle function yields a multiplication in the time domain between > your original signal and the inverse transform of the triangle > function, which happens to be a sinc^2. That's what gives you the > envelope that you see, which, as you stated previously, is related to > the decimation factor (the decimation factor affects the length of the > triangle function, which affects the mainlobe width of the sinc). >
To cancel afterwards the effects of multiplication in the time domain of the signal by the function sinc^2, would the appropriate correction consist in dividing the signal, that I've got by performing the IFFT on the transformed spectrum (e.g. interpolated using the convolution process), by this same function sinc^2 ? Nan values may occur for zeros values of the function, don't they ? Or must this correction be applied in another more appropriate way ? Once again thank you very much for your help and your patience, Scoubi.
> I don't manage to obtain the results of the interpolation I previously > observed, when now using the convolution of the "decimated" spectrum > (in the way you specified) with the triangular function, what should be > the parameters of the triangle function (width, delay, sequence > length...) so that it could work, e.g. for a decimation factor equal > to 4 ? Should I perform the convolution for the whole spectrum (i.e. > [0;Fe[) or the partial spectrum ([0; Fe/2], then reconstruction), on > the complex values or separately on real and imaginary parts ? > I don't manage to obtain the results of the interpolation I previously > observed, when now using the convolution of the "decimated" spectrum > (in the way you specified) with the triangular function, what should be > the parameters of the triangle function (width, delay, sequence > length...) so that it could work,
For this sort of interpolation, you want your triangle waveform to ramp from zero to one in 4 samples, so that when the triangle is at its maximum, the adjacent input samples (4 samples away at your increased sample rate) are zeroed out. So, the vector would look like: t = [0 0.25 0.5 0.75 1 0.75 0.5 0.25 0]; I haven't worked the math out fully, but you should be able to get the expected result by performing the convolution on the entire spectrum. It might be easier to do if you use fftshift() to put the frequencies in a more intuitive order beforehand (you will have to undo it before you ifft() though). If I get a chance, I'll try to work up this example in MATLAB. The zeros on the end are just for clarity and obviously can be omitted. As you pointed out, dividing the sinc^2 out at the end isn't really an option; you'll run into numerical stability issues when dividing by numbers close to zero. You might try a "smarter" interpolation method; since ideal interpolation uses sinc functions, you might use an approximation to sinc as your interpolant instead (sinc functions are non-causal and infinite length, so you can't do it exactly). Since you seem to have suggested that this isn't a real-time application, you could get away with non-causality and just truncate the sequence to a suitable length. Good luck. Jason