Forums

Time Scaling - An iterative approach

Started by Michael Plet March 16, 2017
Hi Group

Here is a method of time scaling that you may or may not have heard
of.
I have used it a few times and like it.

Normally time scaling (change of duration without changing pitch) is
done using STFT. That is overlapping DFT frames.
In order to time scale, different amounts of overlap is used for
analysis and synthesis.
If a frame size of N, an analysis hop size of Ra and a sysnthesis hop
size of Rs is used, then the scaling factor is Rs/Ra.

When changing the hop size used for synthesis without changing the
phase in the synthesis bins, there will be phase discontinuities.
So for each frame the phase values must be adjusted.
This can be done by using a complicated phase propagation formula.
The problem with this is that it ensures phase coherence for bins from
one frame to the next, but there is a loss of phase coherence
between bins.
This can be solved by appropriate initial phase selection.

But there is an easier way - the iterative approach.
The idea is to do synthesis with Rs different from Ra without changing
the phase.
The result will have discontinuities, but now the original magnitude
is forced upon the frequency domain frames.
Now inverse STFT is performed and then the forward STFT again.
The result is a better phase continuity and a change in magnitude.
Now the procedure is repeated by forcing the original magnitude upon
the frequency domain frames.

After maybe 10 iterations the resulting signal has perfect phase
coherence for bin frequencies and across bins.

Here is a Matlab implementation to clarify what I mean:
(stft and istft are forward and inverse STFT.)


[x, Fs] = audioread('...wav');

N = 2048; 
Ra = 256;
Rs = 384;

X = stft(x, N, Ra);

Y = X;

for Ix1 = 1:10
    y = istft(Y, Rs);
    Y = stft(y, N, Rs);
    Y = abs(X).*exp(1j*angle(Y));
end

y = istft(Y, Rs);
soundsc(y, Fs);



Michael
On Thursday, March 16, 2017 at 5:04:18 PM UTC-4, Michael Plet wrote:
> Hi Group > > Here is a method of time scaling that you may or may not have heard > of. > I have used it a few times and like it. > > Normally time scaling (change of duration without changing pitch) is > done using STFT. That is overlapping DFT frames. > In order to time scale, different amounts of overlap is used for > analysis and synthesis. > If a frame size of N, an analysis hop size of Ra and a sysnthesis hop > size of Rs is used, then the scaling factor is Rs/Ra. > > When changing the hop size used for synthesis without changing the > phase in the synthesis bins, there will be phase discontinuities. > So for each frame the phase values must be adjusted. > This can be done by using a complicated phase propagation formula. > The problem with this is that it ensures phase coherence for bins from > one frame to the next, but there is a loss of phase coherence > between bins. > This can be solved by appropriate initial phase selection. > > But there is an easier way - the iterative approach. > The idea is to do synthesis with Rs different from Ra without changing > the phase. > The result will have discontinuities, but now the original magnitude > is forced upon the frequency domain frames. > Now inverse STFT is performed and then the forward STFT again. > The result is a better phase continuity and a change in magnitude. > Now the procedure is repeated by forcing the original magnitude upon > the frequency domain frames. > > After maybe 10 iterations the resulting signal has perfect phase > coherence for bin frequencies and across bins. > > Here is a Matlab implementation to clarify what I mean: > (stft and istft are forward and inverse STFT.) > > > [x, Fs] = audioread('...wav'); > > N = 2048; > Ra = 256; > Rs = 384; > > X = stft(x, N, Ra); > > Y = X; > > for Ix1 = 1:10 > y = istft(Y, Rs); > Y = stft(y, N, Rs); > Y = abs(X).*exp(1j*angle(Y)); > end > > y = istft(Y, Rs); > soundsc(y, Fs); > >
uhm, i've done STFT with MATLAB for decades, but i hadn't ever heard of functions "stft()" nor "istft()". where are these functions documented? i can find only https://www.mathworks.com/matlabcentral/fileexchange/54309-stft-and-its-inverse or similar. r b-j
On Thu, 16 Mar 2017 18:35:58 -0700 (PDT), robert bristow-johnson
<rbj@audioimagination.com> wrote:

>On Thursday, March 16, 2017 at 5:04:18 PM UTC-4, Michael Plet wrote: >> Hi Group >> >> Here is a method of time scaling that you may or may not have heard >> of. >> I have used it a few times and like it. >> >> Normally time scaling (change of duration without changing pitch) is >> done using STFT. That is overlapping DFT frames. >> In order to time scale, different amounts of overlap is used for >> analysis and synthesis. >> If a frame size of N, an analysis hop size of Ra and a sysnthesis hop >> size of Rs is used, then the scaling factor is Rs/Ra. >> >> When changing the hop size used for synthesis without changing the >> phase in the synthesis bins, there will be phase discontinuities. >> So for each frame the phase values must be adjusted. >> This can be done by using a complicated phase propagation formula. >> The problem with this is that it ensures phase coherence for bins from >> one frame to the next, but there is a loss of phase coherence >> between bins. >> This can be solved by appropriate initial phase selection. >> >> But there is an easier way - the iterative approach. >> The idea is to do synthesis with Rs different from Ra without changing >> the phase. >> The result will have discontinuities, but now the original magnitude >> is forced upon the frequency domain frames. >> Now inverse STFT is performed and then the forward STFT again. >> The result is a better phase continuity and a change in magnitude. >> Now the procedure is repeated by forcing the original magnitude upon >> the frequency domain frames. >> >> After maybe 10 iterations the resulting signal has perfect phase >> coherence for bin frequencies and across bins. >> >> Here is a Matlab implementation to clarify what I mean: >> (stft and istft are forward and inverse STFT.) >> >> >> [x, Fs] = audioread('...wav'); >> >> N = 2048; >> Ra = 256; >> Rs = 384; >> >> X = stft(x, N, Ra); >> >> Y = X; >> >> for Ix1 = 1:10 >> y = istft(Y, Rs); >> Y = stft(y, N, Rs); >> Y = abs(X).*exp(1j*angle(Y)); >> end >> >> y = istft(Y, Rs); >> soundsc(y, Fs); >> >> > >uhm, i've done STFT with MATLAB for decades, but i hadn't ever heard of functions "stft()" nor "istft()". where are these functions documented? > > >i can find only https://www.mathworks.com/matlabcentral/fileexchange/54309-stft-and-its-inverse or similar. > >r b-j
Sorry, i wasn't clear about this. They are not built into MATLAB. "stft()" creates an array (matrix) of dimensions N/2 and no. of frames (which depends on length of the signal, hop size and block size (N)). This array contains the spectogram of all frames. "istft()" does the reverse. It takes as input an array created by stft() and synthesizes the frames by overlap add into a real time domain signal. I hope this explains it. Otherwise I can provide more details. Michael
On Friday, March 17, 2017 at 5:50:32 AM UTC-4, Michael Plet wrote:
> > "stft()" creates an array (matrix) of dimensions N/2 and no. of frames > (which depends on length of the signal, hop size and block size (N)). > This array contains the spectogram of all frames. > > "istft()" does the reverse. It takes as input an array created by > stft() and synthesizes the frames by overlap add into a real time > domain signal. > > I hope this explains it. Otherwise I can provide more details. >
so, what's the hop size? and the frame length? (they ain't the same thing, and the *analysis* frame length should not be related to the analysis hop size, but should be related to the *synthesis* hop size (depending on the degree of overlap). and what window (if any) is being used in the analysis. is there any window used in synthesis? (again depends on degree of overlap.) a lot is missing. r b-j
On Fri, 17 Mar 2017 09:13:40 -0700 (PDT), robert bristow-johnson
<rbj@audioimagination.com> wrote:

>On Friday, March 17, 2017 at 5:50:32 AM UTC-4, Michael Plet wrote: >> >> "stft()" creates an array (matrix) of dimensions N/2 and no. of frames >> (which depends on length of the signal, hop size and block size (N)). >> This array contains the spectogram of all frames. >> >> "istft()" does the reverse. It takes as input an array created by >> stft() and synthesizes the frames by overlap add into a real time >> domain signal. >> >> I hope this explains it. Otherwise I can provide more details. >> > >so, what's the hop size? and the frame length? (they ain't the same thing, and the *analysis* frame length should not be related to the analysis hop size, but should be related to the *synthesis* hop size (depending on the degree of overlap). > >and what window (if any) is being used in the analysis. is there any window used in synthesis? (again depends on degree of overlap.) > >a lot is missing. > >r b-j >
Yes, I should have been more specific. Frame length = N = 2048 Analysis hop size = Ra = 256 Synthesis hop size = Rs = 384 The values are just an example of scaling by a factor of 1.5. A Hanning filter is being used. The important thing to notice is how the original magnitude is forced upon the synthesized signal. Doing this repeatedly makes the phase in each bin converge. Michael
On Fri, 17 Mar 2017 17:23:23 +0100, Michael Plet <me@home.com> wrote:

>On Fri, 17 Mar 2017 09:13:40 -0700 (PDT), robert bristow-johnson ><rbj@audioimagination.com> wrote: > >>On Friday, March 17, 2017 at 5:50:32 AM UTC-4, Michael Plet wrote: >>> >>> "stft()" creates an array (matrix) of dimensions N/2 and no. of frames >>> (which depends on length of the signal, hop size and block size (N)). >>> This array contains the spectogram of all frames. >>> >>> "istft()" does the reverse. It takes as input an array created by >>> stft() and synthesizes the frames by overlap add into a real time >>> domain signal. >>> >>> I hope this explains it. Otherwise I can provide more details. >>> >> >>so, what's the hop size? and the frame length? (they ain't the same thing, and the *analysis* frame length should not be related to the analysis hop size, but should be related to the *synthesis* hop size (depending on the degree of overlap). >> >>and what window (if any) is being used in the analysis. is there any window used in synthesis? (again depends on degree of overlap.) >> >>a lot is missing. >> >>r b-j >> > >Yes, I should have been more specific. > >Frame length = N = 2048 >Analysis hop size = Ra = 256 >Synthesis hop size = Rs = 384 > >The values are just an example of scaling by a factor of 1.5. > >A Hanning filter is being used. > >The important thing to notice is how the original magnitude is forced >upon the synthesized signal. Doing this repeatedly makes the phase in >each bin converge. > >Michael
That should be a Hanning window is being used. Michael