DSPRelated.com
Forums

Pitch detection in voice (singing)

Started by smuglr May 13, 2005
In article <W_ednVz8vJ1FHhnfRVn-tQ@giganews.com>,
smuglr <d.mcgilvray@elec.gla.ac.uk> wrote:
>I know the general subject of pitch detection has been flogged to death, >but I am looking specifically for pitch detection in a sung melody.
I'm wondering how that will work with some opera singers, who have such a heavy vibrato in their voice that there's no detectable constant pitch, rather a modulated frequency. (While I like opera, I also never understood why they need to sing that way. Can't they hold a note?) -A
in article 1115999093.675318.252040@g43g2000cwa.googlegroups.com,
cn@c14sw.de at cn@c14sw.de wrote on 05/13/2005 11:44:

> most formulas I've seen go > Cepstrum = FFT( Log(Abs(FFT(WindowedSignal)))) > or > Cepstrum = IFFT( Log(Abs(FFT(WindowedSignal)))),
are you sure about the Abs?
> I think the log-stage makes a difference, though I can't remember > why. > Does somebody else know? >
i can't say exactly what difference it makes, but i thought the idea was to change a convolution operation into one of addition. if WindowedSignal is the convolution of two sequences, FFT(WindowedSignal) will be the product of the FFTs of the two sequences, Log(FFT(WindowedSignal)) will be the sum of the Logs of the FFTs of the two sequences, etc. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
robert bristow-johnson schrieb:
> > most formulas I've seen go > > Cepstrum = FFT( Log(Abs(FFT(WindowedSignal)))) > > or > > Cepstrum = IFFT( Log(Abs(FFT(WindowedSignal)))), > > are you sure about the Abs?
Perhaps Magnitude is the correct term.
> > I think the log-stage makes a difference, though I can't remember > > why. > > Does somebody else know? > > > > i can't say exactly what difference it makes, but i thought the idea
was to
> change a convolution operation into one of addition. if
WindowedSignal is
> the convolution of two sequences, FFT(WindowedSignal) will be the
product of
> the FFTs of the two sequences, Log(FFT(WindowedSignal)) will be the
sum of
> the Logs of the FFTs of the two sequences, etc.
Logarithms are so hard to digest, but thanks for the hint, now i can google up the rest. Carsten Neubauer http://www.c14sw.de
Hi!

When you take IDFT of the log log of the magnitude of the DFT you get the
real cepstrum. If X(k)=|X(k)|e^jphi(k), then log(X(k))=log|X(k)|+jphi(k).
The IDFT of this is the complex cepstrum. I dont think the phase is
important in the pitch detection with the cepstrum since we're looking for
periodic patterns in the magnitude spectrum.

I used the real cepstrum in a pitch detector recently while doing a
project course at school. The resolution is limited by the DFT length, but
the estimate can be improved by using the phase from the DFT bin which has
been found with the cepstrum, and the phase from the same DFT bin in the
previous window. 

Perhaps this is old news... I'm just a student who want to "talk the
talk";)

/ M
		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com
in article 1116054165.853447.127780@z14g2000cwz.googlegroups.com,
cn@c14sw.de at cn@c14sw.de wrote on 05/14/2005 03:02:

> robert bristow-johnson schrieb: >>> most formulas I've seen go >>> Cepstrum = FFT( Log(Abs(FFT(WindowedSignal)))) >>> or >>> Cepstrum = IFFT( Log(Abs(FFT(WindowedSignal)))), >> >> are you sure about the Abs? > > Perhaps Magnitude is the correct term.
i don't think it belongs there at all, however you name it. i recognize it as the "real cepstrum", but if you're gonna apply them theorems that you see in the back of O&S regarding cepstrum (such as "homomorphic deconvolution of speech" to get the periodic driving function, and thus the period), it looks like to me that you need the "complex cepstrum". being that i never did anything with cepstrum since grad school, i can be wrong about that.
>>> I think the log-stage makes a difference, though I can't remember >>> why. Does somebody else know? >> >> i can't say exactly what difference it makes, but i thought the idea was to >> change a convolution operation into one of addition. if WindowedSignal is >> the convolution of two sequences, FFT(WindowedSignal) will be the product of >> the FFTs of the two sequences, Log(FFT(WindowedSignal)) will be the sum of >> the Logs of the FFTs of the two sequences, etc. > > Logarithms are so hard to digest,
why? log(A*B) = log(A) + log(B) . big deel.
> but thanks for the hint, > now i can google up the rest.
maybe borrow Oppenheim and Schafer if you don't have it. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
In article <jc-dnQ-Dd59MGhnfRVn-iA@giganews.com>,
smuglr <d.mcgilvray@elec.gla.ac.uk> wrote:
>>I'm currently using auto-correlation to find an estimate or the >>fundamental frequency, and then using phase-unwrapping to get high >>resolution. > >Meant to say Cepstrum, not auto-correlation - oops!
What window width (in time, milliseconds) are you using with respect to the lowest frequency of interest? I assume that you are doing phase unwrapping using successive frames (overlapped or not?), which means you have some historical data with which to try processing larger frames. If latency is an issue, you might be able to use resampling to increase the frame size by less than a factor of 2X. Does your error rate at a given fundamental frequency vary with frame size? Also, are you looking at the magnitude or real part of your real Cepstrum? e.g. Re(ifft(log(mag(fft(x))))) or mag(ifft(log(mag(fft(x))))) IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.
robert bristow-johnson wrote:
> in article 1116054165.853447.127780@z14g2000cwz.googlegroups.com, > cn@c14sw.de at cn@c14sw.de wrote on 05/14/2005 03:02: > > > robert bristow-johnson schrieb: > >>> most formulas I've seen go > >>> Cepstrum = FFT( Log(Abs(FFT(WindowedSignal)))) > >>> or > >>> Cepstrum = IFFT( Log(Abs(FFT(WindowedSignal)))), > >> > >> are you sure about the Abs? > > > > Perhaps Magnitude is the correct term. > > i don't think it belongs there at all, however you name it. i
recognize it
> as the "real cepstrum", but if you're gonna apply them theorems that
you see
> in the back of O&S regarding cepstrum (such as "homomorphic
deconvolution of
> speech" to get the periodic driving function, and thus the period),
it looks
> like to me that you need the "complex cepstrum". being that i never
did
> anything with cepstrum since grad school, i can be wrong about that.
I don't know much 'bout speech processing but I did look into the cepstrum a couple of years ago. The problem with the complex cepstrum is phase unwrapping. If you can get away by using the real cepstrum, do that.
> >>> I think the log-stage makes a difference, though I can't remember > >>> why. Does somebody else know? > >> > >> i can't say exactly what difference it makes, but i thought the
idea was to
> >> change a convolution operation into one of addition. if
WindowedSignal is
> >> the convolution of two sequences, FFT(WindowedSignal) will be the
product of
> >> the FFTs of the two sequences, Log(FFT(WindowedSignal)) will be
the sum of
> >> the Logs of the FFTs of the two sequences, etc. > > > > Logarithms are so hard to digest, > > why? log(A*B) = log(A) + log(B) . big deel. > > > but thanks for the hint, > > now i can google up the rest. > > maybe borrow Oppenheim and Schafer if you don't have it.
Make sure you get the 1975 book. That's the only general DSP book that goes into the cepstrum in any depth. Probably because Oppenheim wrote his PhD thesis on cepstra in the early 1970ies. Rune
in article 1116103703.918796.202050@g49g2000cwa.googlegroups.com, Rune
Allnor at allnor@tele.ntnu.no wrote on 05/14/2005 16:48:

> I don't know much 'bout speech processing but I did look into the > cepstrum a couple of years ago. The problem with the complex cepstrum > is phase unwrapping.
it's not such a big deal: arg{ X(k+1) } = arg{ X(k) } + arg{ X(k+1)/X(k) } = arg( X(k) } + arctan( Im{X(k+1)/X(k)} / Re{X(k+1)/X(k)} ) = arg( X(k) } + arctan( ( Im{X(k+1)}*Re{X(k)} + Re{X(k+1)}*Im{X(k)} )/ ( Re{X(k+1)}*Re{X(k)} + Im{X(k+1)}*Im{X(k)} ) ) i think that's right, ain't it? often, all we need is the derivative or the difference and that's pretty secure.
> If you can get away by using the real cepstrum, do that.
i occasionally get into arguments about whether or not you can neglect phase in audio signals. dunno if it's okay for speech, but i don't think it's a good idea for generalized audio signals. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
in article BEABF289.7478%rbj@audioimagination.com, robert bristow-johnson at
rbj@audioimagination.com wrote on 05/14/2005 18:23:

> arg{ X(k+1) } = arg{ X(k) } + arg{ X(k+1)/X(k) } > > = arg( X(k) } + arctan( Im{X(k+1)/X(k)} / Re{X(k+1)/X(k)} ) > > = arg( X(k) } > > + arctan( ( Im{X(k+1)}*Re{X(k)} + Re{X(k+1)}*Im{X(k)} )/ > ( Re{X(k+1)}*Re{X(k)} + Im{X(k+1)}*Im{X(k)} ) ) > > i think that's right, ain't it? >
not quite. dropped a sign. arg{ X(k+1) } = arg{ X(k) } + arg{ X(k+1)/X(k) } = arg( X(k) } + arctan( Im{X(k+1)/X(k)} / Re{X(k+1)/X(k)} ) = arg( X(k) } + arctan( ( Im{X(k+1)}*Re{X(k)} - Re{X(k+1)}*Im{X(k)} )/ ( Re{X(k+1)}*Re{X(k)} + Im{X(k+1)}*Im{X(k)} ) ) i think that's it. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
robert bristow-johnson wrote:
> in article 1116103703.918796.202050@g49g2000cwa.googlegroups.com,
Rune
> Allnor at allnor@tele.ntnu.no wrote on 05/14/2005 16:48: > > > I don't know much 'bout speech processing but I did look into the > > cepstrum a couple of years ago. The problem with the complex
cepstrum
> > is phase unwrapping. > > it's not such a big deal: > > arg{ X(k+1) } = arg{ X(k) } + arg{ X(k+1)/X(k) } > > = arg( X(k) } + arctan( Im{X(k+1)/X(k)} /
Re{X(k+1)/X(k)} )
> > = arg( X(k) } > > + arctan( ( Im{X(k+1)}*Re{X(k)} +
Re{X(k+1)}*Im{X(k)} )/
> ( Re{X(k+1)}*Re{X(k)} +
Im{X(k+1)}*Im{X(k)} ) )
> > i think that's right, ain't it?
It looks OK. The problem is, according to O&S, that the cepstrum needs to have a certain form (real-valued) and so the log|X(w)| and arg{X(w)} terms need to be Hilbert transform pairs. In order to do that, both need to be analytic (i.e. continuous) on the unit circle in z domain. Taking a closer look at the phase, it is clearly ambiguous: arg{X(w)} = phi(w) + k*2*pi where k is any integer. So in addition to doing the arctan thing, you need to find a k that gets the phase function over the spectrum to meet the requirements of being analytic. There was an IEEE paper on such issues by Tribolet in 1977, I think. Check with IEEExplore.
> often, all we need is the derivative or the difference and that's
pretty
> secure.
I don't know. I guess this depends on the application. The problem I found, that made me abandon the cepstrum for the applications I had in mind, was that the derivative of the phase governs causality of the signal. The group delay is given as v_g(w) = -d arg{X(w)}/dw and for a continuous, periodic, non-constant phase function v_g(w), the group delay must necessarily become negative in some frequency range. So I ended up with non-causal signals where I didn't want any.
> > If you can get away by using the real cepstrum, do that. > > i occasionally get into arguments about whether or not you can
neglect phase
> in audio signals. dunno if it's okay for speech, but i don't think
it's a
> good idea for generalized audio signals.
Well, it certailnly wasn't a good idea for the non-speech signals I had in mind. Rune