> this notation is an impediment, not useful.
Sorry about that, I did my best to fully describe my computation to
avoid misunderstandings/quiproquos about the implementation.
Thanks for still taking the time to parse it. Are there guidelines for
ASCII DSP notations that you could point me at ?
> X[k] = DFT{ x[n] }
Yep. Discrete FT. Sorry.
>> Q3 - Are the following the minimum sufficient conditions for correctly
>> computing a, p and f :
>> - x has a bandwidth below fs÷2 (Nyquist criterion)
>> - x has a total bandwitdh smaller than an octave (cf.
>>
http://dsp.stackexchange.com/questions/2914/does-taking-the-hilbert-transform-extract-the-envelope-of-audio-signals
>>
>> )
>>
>
> i don't know. having a bandwidth of less than an octave does not mean
> that it's quasi-sinusoidal, but it's sorta getting there. if the
> bandwidth is *much* less than an octave, maybe it's sufficiently
> sinusoidal to be considered "quasi-sinusoidal".
could you either :
- point me at a definition of "quasi sinusoidal"
- give me a forula for the minimum bandwidth that makes the signal
"quasi-sinusoidal"
- give me a formula that explains how the bandwitdth impacts the
approximation of instantaneous amplitude and frequency
> if a(t) is a slowly varying amplitude function (which you don't want
> changed), then changing the phase function so that it represents a
> different instantaneous frequency than what goes in, that is frequency
> shifting. for a sinusoid with no harmonics, "frequency shifting" is
> pretty much the same thing as "pitch shifting" if you define the
> resulting instantaneous frequency the same.
could you define "slowly" ?
> what muscial harmony if your bandwidth is so limited?
Well I planned to apply the pitch shifting to a filterbank on slices of
bandwidth of the original signal, and then recompose it.
Under which conditions could that work ?
> the way we usually do pitch shifting on audio is different from what it
> appears you're describing here.
Yep. I would like to do it through the frequency domain for two reasons :
- I don't need a real-time implementation (i am doing non-causal processing)
- I don't care (yet) about the efficiency of the algorithm
- If correctly implemented, I believe it would allow me to do dynamic
harmonic pitch-shifting, that is actually harmonic modulation. Am I
doing it wrong ?
Thanks for your feedback !
Nic.
robert bristow-johnson wrote:
> On 12/14/13 11:14 AM, Nicolas Delcros wrote:
>> Hi all !
>>
>> My goal is to implement non-causal harmonic pitch-shifting on a
>> fully-known real audio signal.
>>
>> * denotes multiplication, ÷ denotes division, exp() denotes
>> exponentiation.
>>
>
> so far i don't mind your notational convention because it is
> conventional. my keyboard doesn't have the ÷ key, and i don't like
> cutting and pasting.
>
> but the notation below is something i really don't want to parse. ASCII
> math is a problem, but i don't think your notation is the solution.
>
>> (a J b) denotes the complex number of real part a and of imaginary part
>> b. If a is omited it defaults to 0, and b to 1, so that J on itw own
>> denotes (0J1).
>> (m Z p) denotes the complex number of magnitude m and phase p. I will
>> use unitary phase, so that p in radians is (2pi*p), and the complex
>> value is m*exp(J*2pi*p).
>>
>> )x;y) denotes the real numbers between x (excluded) and y (included).
>> [i;j[ denotes the integers between i (included) and j (excluded), and
>> matches ]i-1;j-1].
>> +/i∊[0;n[ denotes the discrete summation of an expression over i∊[0;n[.
>>
>> Let x be an arbitrary discrete vector of real samples taken linearly
>> through time. Let n be its size and fs its sampling frequency in hertz.
>> The value of x at time (t÷fs) (in seconds) is x[t] for t∊[0;n[.
>>
>> The Fourier transform of x, denoted X = FT(x), is defined at frequency
>> (f*fs÷n) for f∊[0;n[ by :
>> X[f] = FT(x)[f] = +/t∊[0;n[ x[t] Z -(t÷n)
>
>
> why don't we do this. let's keep with most of the lit and define
> continuous functions of time or frequency with parenths:
>
> X(f) = FT{ x(t) }
>
> and discrete time or frequency with integer arguments surrounded by
> square brackets:
>
> X[k] = DFT{ x[n] }
>
> there are other transforms, the DTFT, Z (or Fourier series) that mix
> discrete time (or frequency) with continuous frequency (or time).
>
>>
>> The inverse Fourier transform of X, denoted x = IFT(X), is defined at
>> time (t÷fs) for t∊[0;n[ by :
>> x[t] = IFT(X)[t] = +/f∊[0;n[ X[f] Z (f÷n)
>>
>> The signum of f, denoted sig(f), is 1 for f∊]0;n÷2[, -1 for f∊]n÷2;n[,
>> and 0 for f=0 and f=n÷2.
>> The Hilbert transform of x, denoted HT(x), is IFT(-J*sig(f)*FT(x)[f]),
>> that is taking the Fourier transform of x, rotating all frequency phases
>> by 1÷4 (that is 90 degrees or pi÷2 radians), and taking the inverse
>> Fourier transform.
>>
>> The instantaneous analytic form of x[t], denoted z[t] is defined by :
>> z[t] = x[t] J HT(x)[t]
>
> this notation is an impediment, not useful.
>
>>
>> a[t] denotes the magnitude of z[t]. and is called the instantaneous
>> amplitude of x[t].
>> p[t] denotes the phase of z[t], and is called the instantaneous phase of
>> x[t] in cycles
>> f[t] denotes the derivative of p[t], and is called the instantaneous
>> frequency of x[t] in cycles per sample.
>> So f[t] = p[t]-p[t-1] for t∊[0;n], considering that p[-1]=0 (so f[0] is
>> p[0]), and p[t] = +/i∊[0;t] f[i]
>>
>> The pitch-shifting of z[t] by a factor m, denoted PS(z), is defined by :
>> PS(z)[t] = a[t] Z ( +/[i∊0;t] m*f[t] )
>> And the resulting audio signal is its real part.
>>
>>
>> Questions :
>>
>> Q1 - Is it correct that for any discrete real signal x, HT(x)[t] is
>> strictly real if the computer has infinite precision ?
>
> the Hilbert transform of a purely real signal is purely real.
>
>>
>> Q2 - Is it correct that the proposed computation of the instantaneous
>> frequency of f[t], using a unitary phase, yields a value in Hertz of
>> (fs*f[t]) ?
>
> if you compute the analytic signal of a time-varying quasi-*sinusoid*
> (something more specific than a more general quasi-periodic signal you
> might get in audio), and unwrap the phase, the derivative of the phase
> vs. time function does get you the instantaneous frequency. there are
> ways of computing the instantaneous frequency directly from the real and
> imag parts that avoids the need for unwrapping.
>
>> Q3 - Are the following the minimum sufficient conditions for correctly
>> computing a, p and f :
>> - x has a bandwidth below fs÷2 (Nyquist criterion)
>> - x has a total bandwitdh smaller than an octave (cf.
>> http://dsp.stackexchange.com/questions/2914/does-taking-the-hilbert-transform-extract-the-envelope-of-audio-signals
>>
>> )
>>
>
> i don't know. having a bandwidth of less than an octave does not mean
> that it's quasi-sinusoidal, but it's sorta getting there. if the
> bandwidth is *much* less than an octave, maybe it's sufficiently
> sinusoidal to be considered "quasi-sinusoidal".
>
>> Q4 - Are the following the minimum sufficient conditions for correctly
>> computing PS(z) :
>> - f[t] is correctly computed
>> - m*f[t] is smaller than 1÷2 (Nyquist criterion - the output
>> instantaneous frequency in Hertz is less than fs÷2 - the phase doesn't
>> jump by more than pi radians)
>
> i can't really decode your notation for PS and i don't care to.
>
> if a(t) is a slowly varying amplitude function (which you don't want
> changed), then changing the phase function so that it represents a
> different instantaneous frequency than what goes in, that is frequency
> shifting. for a sinusoid with no harmonics, "frequency shifting" is
> pretty much the same thing as "pitch shifting" if you define the
> resulting instantaneous frequency the same.
>
>>
>> Q5 - If correctly computed, will the real part of PS(z) preserve the
>> musical harmony of the original real audio signal ?
>>
>
> what muscial harmony if your bandwidth is so limited?
>
>> Q6 - What is the best way to break x into signals of bandwidth smaller
>> than an octave, and then recompose them after individually
>> pitch-shifting them ?
>
> filter bank.
>
>> Yep, sorry, this question is a bit wide, that is inversely proportional
>> to my understanding of the problem.
>>
>>
>>
>>
>> Thanks a lot for enlightening me =)
>>
>
> the way we usually do pitch shifting on audio is different from what it
> appears you're describing here.
>
> usually it's equivalent to the analog speeding up or slowing down the
> tape, and then using splicing to make the "tape" longer or short to
> compensate for the speed and get the tempo back to the original.
>
> this can be done glitch-free with a frequency-domain method like the
> phase vocoder, or it can be done in the time domain with a pitch
> detector and seamless splicing (or as seamless as possible).
>
>