DSPRelated.com
Forums

Request for comments : Non-causal pitch modulation via instantaneous frequency

Started by Nicolas Delcros December 14, 2013
Hi all !

My goal is to implement non-causal harmonic pitch-shifting on a 
fully-known real audio signal.

* denotes multiplication, ÷ denotes division, exp() denotes exponentiation.

(a J b) denotes the complex number of real part a and of imaginary part 
b. If a is omited it defaults to 0, and b to 1, so that J on itw own 
denotes (0J1).
(m Z p) denotes the complex number of magnitude m and phase p. I will 
use unitary phase, so that p in radians is (2pi*p), and the complex 
value is m*exp(J*2pi*p).

)x;y) denotes the real numbers between x (excluded) and y (included).
[i;j[ denotes the integers between i (included) and j (excluded), and 
matches ]i-1;j-1].
+/i∊[0;n[ denotes the discrete summation of an expression over i∊[0;n[.

Let x be an arbitrary discrete vector of real samples taken linearly 
through time. Let n be its size and fs its sampling frequency in hertz.
The value of x at time (t÷fs) (in seconds) is x[t] for t∊[0;n[.

The Fourier transform of x, denoted X = FT(x), is defined at frequency 
(f*fs÷n) for f∊[0;n[ by :
X[f] = FT(x)[f] = +/t∊[0;n[ x[t] Z -(t÷n)

The inverse Fourier transform of X, denoted x = IFT(X), is defined at 
time (t÷fs) for t∊[0;n[ by :
x[t] = IFT(X)[t] = +/f∊[0;n[ X[f] Z (f÷n)

The signum of f, denoted sig(f), is 1 for f∊]0;n÷2[, -1 for f∊]n÷2;n[, 
and 0 for f=0 and f=n÷2.
The Hilbert transform of x, denoted HT(x), is IFT(-J*sig(f)*FT(x)[f]), 
that is taking the Fourier transform of x, rotating all frequency phases 
by 1÷4 (that is 90 degrees or pi÷2 radians), and taking the inverse 
Fourier transform.

The instantaneous analytic form of x[t], denoted z[t] is defined by :
z[t] = x[t] J HT(x)[t]

a[t] denotes the magnitude of z[t]. and is called the instantaneous 
amplitude of x[t].
p[t] denotes the phase of z[t], and is called the instantaneous phase of 
x[t] in cycles
f[t] denotes the derivate of p[t], and is called the instataneous 
frequency of x[t] in cycles per sample.
So f[t] = p[t]-p[t-1] for t∊[0;n], considering that p[-1]=0 (so f[0] is 
p[0]), and p[t] = +/i∊[0;t] f[i]

The pitch-shifting of z[t] by a factor m, denoted PS(z), is defined by :
PS(z)[t] = a[t] Z ( +/[i∊0;t] m*f[t] )
And the resulting audio signal is its real part.


Questions :

Q1 - Is it correct that for any discrete real signal x, HT(x)[t] is 
strictly real if the computer has infinite precision ?

Q2 - Is it correct that the proposed computation of the instantaneous 
frequency of f[t], using a unitary phase, yields a value in Hertz of 
(fs*f[t]) ?

Q3 - Are the following the minimum sufficient conditions for correctly 
computing a, p and f :
- x has a bandwidth below fs÷2 (Nyquist criterion)
- x has a total bandwitdh smaller than an octave (cf. 
http://dsp.stackexchange.com/questions/2914/does-taking-the-hilbert-transform-extract-the-envelope-of-audio-signals 
)

Q4 - Are the following the mininum sufficient conditions for correctly 
computing PS(z) :
- f[t] is correctly computed
- m*f[t] is smaller than 1÷2 (Nyquist criterion - the output 
instantaneous frequency in Hertz is less than fs÷2 - the phase doesn't 
jump by more than pi radians)

Q5 - If correctly computed, will the real part of PS(z) preserve the 
musical harmony of the original real audio signal ?

Q6 - What is the best way to break x into signals of bandwidth smaller 
than an octave, and then recompose them after individually 
pitch-shifting them ?
Yep, sorry, this question is a bit wide, that is inversly proportional 
to my understanding of the problem.




Thanks a lot for enlightening me =)

Nic.
On 12/14/13 11:14 AM, Nicolas Delcros wrote:
> Hi all ! > > My goal is to implement non-causal harmonic pitch-shifting on a > fully-known real audio signal. > > * denotes multiplication, ÷ denotes division, exp() denotes exponentiation. >
so far i don't mind your notational convention because it is conventional. my keyboard doesn't have the ÷ key, and i don't like cutting and pasting. but the notation below is something i really don't want to parse. ASCII math is a problem, but i don't think your notation is the solution.
> (a J b) denotes the complex number of real part a and of imaginary part > b. If a is omited it defaults to 0, and b to 1, so that J on itw own > denotes (0J1). > (m Z p) denotes the complex number of magnitude m and phase p. I will > use unitary phase, so that p in radians is (2pi*p), and the complex > value is m*exp(J*2pi*p). > > )x;y) denotes the real numbers between x (excluded) and y (included). > [i;j[ denotes the integers between i (included) and j (excluded), and > matches ]i-1;j-1]. > +/i∊[0;n[ denotes the discrete summation of an expression over i∊[0;n[. > > Let x be an arbitrary discrete vector of real samples taken linearly > through time. Let n be its size and fs its sampling frequency in hertz. > The value of x at time (t÷fs) (in seconds) is x[t] for t∊[0;n[. > > The Fourier transform of x, denoted X = FT(x), is defined at frequency > (f*fs÷n) for f∊[0;n[ by : > X[f] = FT(x)[f] = +/t∊[0;n[ x[t] Z -(t÷n)
why don't we do this. let's keep with most of the lit and define continuous functions of time or frequency with parenths: X(f) = FT{ x(t) } and discrete time or frequency with integer arguments surrounded by square brackets: X[k] = DFT{ x[n] } there are other transforms, the DTFT, Z (or Fourier series) that mix discrete time (or frequency) with continuous frequency (or time).
> > The inverse Fourier transform of X, denoted x = IFT(X), is defined at > time (t÷fs) for t∊[0;n[ by : > x[t] = IFT(X)[t] = +/f∊[0;n[ X[f] Z (f÷n) > > The signum of f, denoted sig(f), is 1 for f∊]0;n÷2[, -1 for f∊]n÷2;n[, > and 0 for f=0 and f=n÷2. > The Hilbert transform of x, denoted HT(x), is IFT(-J*sig(f)*FT(x)[f]), > that is taking the Fourier transform of x, rotating all frequency phases > by 1÷4 (that is 90 degrees or pi÷2 radians), and taking the inverse > Fourier transform. > > The instantaneous analytic form of x[t], denoted z[t] is defined by : > z[t] = x[t] J HT(x)[t]
this notation is an impediment, not useful.
> > a[t] denotes the magnitude of z[t]. and is called the instantaneous > amplitude of x[t]. > p[t] denotes the phase of z[t], and is called the instantaneous phase of > x[t] in cycles > f[t] denotes the derivative of p[t], and is called the instantaneous > frequency of x[t] in cycles per sample. > So f[t] = p[t]-p[t-1] for t∊[0;n], considering that p[-1]=0 (so f[0] is > p[0]), and p[t] = +/i∊[0;t] f[i] > > The pitch-shifting of z[t] by a factor m, denoted PS(z), is defined by : > PS(z)[t] = a[t] Z ( +/[i∊0;t] m*f[t] ) > And the resulting audio signal is its real part. > > > Questions : > > Q1 - Is it correct that for any discrete real signal x, HT(x)[t] is > strictly real if the computer has infinite precision ?
the Hilbert transform of a purely real signal is purely real.
> > Q2 - Is it correct that the proposed computation of the instantaneous > frequency of f[t], using a unitary phase, yields a value in Hertz of > (fs*f[t]) ?
if you compute the analytic signal of a time-varying quasi-*sinusoid* (something more specific than a more general quasi-periodic signal you might get in audio), and unwrap the phase, the derivative of the phase vs. time function does get you the instantaneous frequency. there are ways of computing the instantaneous frequency directly from the real and imag parts that avoids the need for unwrapping.
> Q3 - Are the following the minimum sufficient conditions for correctly > computing a, p and f : > - x has a bandwidth below fs÷2 (Nyquist criterion) > - x has a total bandwitdh smaller than an octave (cf. > http://dsp.stackexchange.com/questions/2914/does-taking-the-hilbert-transform-extract-the-envelope-of-audio-signals > ) >
i don't know. having a bandwidth of less than an octave does not mean that it's quasi-sinusoidal, but it's sorta getting there. if the bandwidth is *much* less than an octave, maybe it's sufficiently sinusoidal to be considered "quasi-sinusoidal".
> Q4 - Are the following the minimum sufficient conditions for correctly > computing PS(z) : > - f[t] is correctly computed > - m*f[t] is smaller than 1÷2 (Nyquist criterion - the output > instantaneous frequency in Hertz is less than fs÷2 - the phase doesn't > jump by more than pi radians)
i can't really decode your notation for PS and i don't care to. if a(t) is a slowly varying amplitude function (which you don't want changed), then changing the phase function so that it represents a different instantaneous frequency than what goes in, that is frequency shifting. for a sinusoid with no harmonics, "frequency shifting" is pretty much the same thing as "pitch shifting" if you define the resulting instantaneous frequency the same.
> > Q5 - If correctly computed, will the real part of PS(z) preserve the > musical harmony of the original real audio signal ? >
what muscial harmony if your bandwidth is so limited?
> Q6 - What is the best way to break x into signals of bandwidth smaller > than an octave, and then recompose them after individually > pitch-shifting them ?
filter bank.
> Yep, sorry, this question is a bit wide, that is inversely proportional > to my understanding of the problem. > > > > > Thanks a lot for enlightening me =) >
the way we usually do pitch shifting on audio is different from what it appears you're describing here. usually it's equivalent to the analog speeding up or slowing down the tape, and then using splicing to make the "tape" longer or short to compensate for the speed and get the tempo back to the original. this can be done glitch-free with a frequency-domain method like the phase vocoder, or it can be done in the time domain with a pitch detector and seamless splicing (or as seamless as possible). -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
 > this notation is an impediment, not useful.

Sorry about that, I did my best to fully describe my computation to 
avoid misunderstandings/quiproquos about the implementation.

Thanks for still taking the time to parse it. Are there guidelines for 
ASCII DSP notations that you could point me at ?



 >    X[k]  =   DFT{ x[n] }

Yep. Discrete FT. Sorry.



 >> Q3 - Are the following the minimum sufficient conditions for correctly
 >> computing a, p and f :
 >> - x has a bandwidth below fs÷2 (Nyquist criterion)
 >> - x has a total bandwitdh smaller than an octave (cf.
 >> 
http://dsp.stackexchange.com/questions/2914/does-taking-the-hilbert-transform-extract-the-envelope-of-audio-signals 

 >>
 >> )
 >>
 >
 > i don't know.  having a bandwidth of less than an octave does not mean
 > that it's quasi-sinusoidal, but it's sorta getting there.  if the
 > bandwidth is *much* less than an octave, maybe it's sufficiently
 > sinusoidal to be considered "quasi-sinusoidal".

could you either :
- point me at a definition of "quasi sinusoidal"
- give me a forula for the minimum bandwidth that makes the signal 
"quasi-sinusoidal"
- give me a formula that explains how the bandwitdth impacts the 
approximation of instantaneous amplitude and frequency



 > if a(t) is a slowly varying amplitude function (which you don't want
 > changed), then changing the phase function so that it represents a
 > different instantaneous frequency than what goes in, that is frequency
 > shifting.  for a sinusoid with no harmonics, "frequency shifting" is
 > pretty much the same thing as "pitch shifting" if you define the
 > resulting instantaneous frequency the same.

could you define "slowly" ?



 > what muscial harmony if your bandwidth is so limited?

Well I planned to apply the pitch shifting to a filterbank on slices of 
bandwidth of the original signal, and then recompose it.
Under which conditions could that work ?




 > the way we usually do pitch shifting on audio is different from what it
 > appears you're describing here.

Yep. I would like to do it through the frequency domain for two reasons :
- I don't need a real-time implementation (i am doing non-causal processing)
- I don't care (yet) about the efficiency of the algorithm
- If correctly implemented, I believe it would allow me to do dynamic 
harmonic pitch-shifting, that is actually harmonic modulation. Am I 
doing it wrong ?



Thanks for your feedback !


Nic.






robert bristow-johnson wrote:
> On 12/14/13 11:14 AM, Nicolas Delcros wrote: >> Hi all ! >> >> My goal is to implement non-causal harmonic pitch-shifting on a >> fully-known real audio signal. >> >> * denotes multiplication, ÷ denotes division, exp() denotes >> exponentiation. >> > > so far i don't mind your notational convention because it is > conventional. my keyboard doesn't have the ÷ key, and i don't like > cutting and pasting. > > but the notation below is something i really don't want to parse. ASCII > math is a problem, but i don't think your notation is the solution. > >> (a J b) denotes the complex number of real part a and of imaginary part >> b. If a is omited it defaults to 0, and b to 1, so that J on itw own >> denotes (0J1). >> (m Z p) denotes the complex number of magnitude m and phase p. I will >> use unitary phase, so that p in radians is (2pi*p), and the complex >> value is m*exp(J*2pi*p). >> >> )x;y) denotes the real numbers between x (excluded) and y (included). >> [i;j[ denotes the integers between i (included) and j (excluded), and >> matches ]i-1;j-1]. >> +/i∊[0;n[ denotes the discrete summation of an expression over i∊[0;n[. >> >> Let x be an arbitrary discrete vector of real samples taken linearly >> through time. Let n be its size and fs its sampling frequency in hertz. >> The value of x at time (t÷fs) (in seconds) is x[t] for t∊[0;n[. >> >> The Fourier transform of x, denoted X = FT(x), is defined at frequency >> (f*fs÷n) for f∊[0;n[ by : >> X[f] = FT(x)[f] = +/t∊[0;n[ x[t] Z -(t÷n) > > > why don't we do this. let's keep with most of the lit and define > continuous functions of time or frequency with parenths: > > X(f) = FT{ x(t) } > > and discrete time or frequency with integer arguments surrounded by > square brackets: > > X[k] = DFT{ x[n] } > > there are other transforms, the DTFT, Z (or Fourier series) that mix > discrete time (or frequency) with continuous frequency (or time). > >> >> The inverse Fourier transform of X, denoted x = IFT(X), is defined at >> time (t÷fs) for t∊[0;n[ by : >> x[t] = IFT(X)[t] = +/f∊[0;n[ X[f] Z (f÷n) >> >> The signum of f, denoted sig(f), is 1 for f∊]0;n÷2[, -1 for f∊]n÷2;n[, >> and 0 for f=0 and f=n÷2. >> The Hilbert transform of x, denoted HT(x), is IFT(-J*sig(f)*FT(x)[f]), >> that is taking the Fourier transform of x, rotating all frequency phases >> by 1÷4 (that is 90 degrees or pi÷2 radians), and taking the inverse >> Fourier transform. >> >> The instantaneous analytic form of x[t], denoted z[t] is defined by : >> z[t] = x[t] J HT(x)[t] > > this notation is an impediment, not useful. > >> >> a[t] denotes the magnitude of z[t]. and is called the instantaneous >> amplitude of x[t]. >> p[t] denotes the phase of z[t], and is called the instantaneous phase of >> x[t] in cycles >> f[t] denotes the derivative of p[t], and is called the instantaneous >> frequency of x[t] in cycles per sample. >> So f[t] = p[t]-p[t-1] for t∊[0;n], considering that p[-1]=0 (so f[0] is >> p[0]), and p[t] = +/i∊[0;t] f[i] >> >> The pitch-shifting of z[t] by a factor m, denoted PS(z), is defined by : >> PS(z)[t] = a[t] Z ( +/[i∊0;t] m*f[t] ) >> And the resulting audio signal is its real part. >> >> >> Questions : >> >> Q1 - Is it correct that for any discrete real signal x, HT(x)[t] is >> strictly real if the computer has infinite precision ? > > the Hilbert transform of a purely real signal is purely real. > >> >> Q2 - Is it correct that the proposed computation of the instantaneous >> frequency of f[t], using a unitary phase, yields a value in Hertz of >> (fs*f[t]) ? > > if you compute the analytic signal of a time-varying quasi-*sinusoid* > (something more specific than a more general quasi-periodic signal you > might get in audio), and unwrap the phase, the derivative of the phase > vs. time function does get you the instantaneous frequency. there are > ways of computing the instantaneous frequency directly from the real and > imag parts that avoids the need for unwrapping. > >> Q3 - Are the following the minimum sufficient conditions for correctly >> computing a, p and f : >> - x has a bandwidth below fs÷2 (Nyquist criterion) >> - x has a total bandwitdh smaller than an octave (cf. >> http://dsp.stackexchange.com/questions/2914/does-taking-the-hilbert-transform-extract-the-envelope-of-audio-signals >> >> ) >> > > i don't know. having a bandwidth of less than an octave does not mean > that it's quasi-sinusoidal, but it's sorta getting there. if the > bandwidth is *much* less than an octave, maybe it's sufficiently > sinusoidal to be considered "quasi-sinusoidal". > >> Q4 - Are the following the minimum sufficient conditions for correctly >> computing PS(z) : >> - f[t] is correctly computed >> - m*f[t] is smaller than 1÷2 (Nyquist criterion - the output >> instantaneous frequency in Hertz is less than fs÷2 - the phase doesn't >> jump by more than pi radians) > > i can't really decode your notation for PS and i don't care to. > > if a(t) is a slowly varying amplitude function (which you don't want > changed), then changing the phase function so that it represents a > different instantaneous frequency than what goes in, that is frequency > shifting. for a sinusoid with no harmonics, "frequency shifting" is > pretty much the same thing as "pitch shifting" if you define the > resulting instantaneous frequency the same. > >> >> Q5 - If correctly computed, will the real part of PS(z) preserve the >> musical harmony of the original real audio signal ? >> > > what muscial harmony if your bandwidth is so limited? > >> Q6 - What is the best way to break x into signals of bandwidth smaller >> than an octave, and then recompose them after individually >> pitch-shifting them ? > > filter bank. > >> Yep, sorry, this question is a bit wide, that is inversely proportional >> to my understanding of the problem. >> >> >> >> >> Thanks a lot for enlightening me =) >> > > the way we usually do pitch shifting on audio is different from what it > appears you're describing here. > > usually it's equivalent to the analog speeding up or slowing down the > tape, and then using splicing to make the "tape" longer or short to > compensate for the speed and get the tempo back to the original. > > this can be done glitch-free with a frequency-domain method like the > phase vocoder, or it can be done in the time domain with a pitch > detector and seamless splicing (or as seamless as possible). > >
On 12/14/13 2:07 PM, Nicolas Delcros wrote:
> > this notation is an impediment, not useful. > > Sorry about that, I did my best to fully describe my computation to > avoid misunderstandings/quiproquos about the implementation. > > Thanks for still taking the time to parse it. Are there guidelines for > ASCII DSP notations that you could point me at ? >
probably none that we all agree on. some folks here will use LaTeX notation, which is unambiguous but does not necessarily look like the math (and i don't know of any USENET reader that renders LaTeX in posts). i just try to make the ASCII math look like the math in the books assuming mono-spacing and substituting words like "SUM" for the big sigma and "integral" for the integral symbol. i also use "*" for _multiplication_ (i use "(*)" for convolution, but seldom) and "^" for exponents. sometimes i leave out the "*" for multiplication when it is clear otherwise. like the lit, i use parentheses surrounding continuous-variable arguments, like "f(x)", and square brackets for discrete-variable arguments (which are the same as indices) like "x[n]". sometimes i use underscore for indices (like "a_0, a_1, a_2") and sometimes i leave the underscore off (like "a0, a1, a2"). so for continuous time, continuous frequency, here is an example: +inf X(f) = integral{ x(t) e^(-j*2*pi*f*t) dt} -inf and for discrete time and discrete frequency N-1 X[k] = SUM{ x[n] e^(-j*2*pi*k*n/N) } n=0 connecting the two might be: x[n] = x(nT) or +inf x(t) = SUM{ x[n] sinc( (t-nT)/T ) } n=-inf viewed with a mono-spaced font, i think this looks pretty clear, but i know in the past some folks here (like Eric) have not concurred. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."