In article <W_ednVz8vJ1FHhnfRVn-tQ@giganews.com>, smuglr <d.mcgilvray@elec.gla.ac.uk> wrote:>I know the general subject of pitch detection has been flogged to death, >but I am looking specifically for pitch detection in a sung melody.I'm wondering how that will work with some opera singers, who have such a heavy vibrato in their voice that there's no detectable constant pitch, rather a modulated frequency. (While I like opera, I also never understood why they need to sing that way. Can't they hold a note?) -A
Pitch detection in voice (singing)
Started by ●May 13, 2005
Reply by ●May 13, 20052005-05-13
Reply by ●May 13, 20052005-05-13
in article 1115999093.675318.252040@g43g2000cwa.googlegroups.com, cn@c14sw.de at cn@c14sw.de wrote on 05/13/2005 11:44:> most formulas I've seen go > Cepstrum = FFT( Log(Abs(FFT(WindowedSignal)))) > or > Cepstrum = IFFT( Log(Abs(FFT(WindowedSignal)))),are you sure about the Abs?> I think the log-stage makes a difference, though I can't remember > why. > Does somebody else know? >i can't say exactly what difference it makes, but i thought the idea was to change a convolution operation into one of addition. if WindowedSignal is the convolution of two sequences, FFT(WindowedSignal) will be the product of the FFTs of the two sequences, Log(FFT(WindowedSignal)) will be the sum of the Logs of the FFTs of the two sequences, etc. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by ●May 14, 20052005-05-14
robert bristow-johnson schrieb:> > most formulas I've seen go > > Cepstrum = FFT( Log(Abs(FFT(WindowedSignal)))) > > or > > Cepstrum = IFFT( Log(Abs(FFT(WindowedSignal)))), > > are you sure about the Abs?Perhaps Magnitude is the correct term.> > I think the log-stage makes a difference, though I can't remember > > why. > > Does somebody else know? > > > > i can't say exactly what difference it makes, but i thought the ideawas to> change a convolution operation into one of addition. ifWindowedSignal is> the convolution of two sequences, FFT(WindowedSignal) will be theproduct of> the FFTs of the two sequences, Log(FFT(WindowedSignal)) will be thesum of> the Logs of the FFTs of the two sequences, etc.Logarithms are so hard to digest, but thanks for the hint, now i can google up the rest. Carsten Neubauer http://www.c14sw.de
Reply by ●May 14, 20052005-05-14
Hi! When you take IDFT of the log log of the magnitude of the DFT you get the real cepstrum. If X(k)=|X(k)|e^jphi(k), then log(X(k))=log|X(k)|+jphi(k). The IDFT of this is the complex cepstrum. I dont think the phase is important in the pitch detection with the cepstrum since we're looking for periodic patterns in the magnitude spectrum. I used the real cepstrum in a pitch detector recently while doing a project course at school. The resolution is limited by the DFT length, but the estimate can be improved by using the phase from the DFT bin which has been found with the cepstrum, and the phase from the same DFT bin in the previous window. Perhaps this is old news... I'm just a student who want to "talk the talk";) / M This message was sent using the Comp.DSP web interface on www.DSPRelated.com
Reply by ●May 14, 20052005-05-14
in article 1116054165.853447.127780@z14g2000cwz.googlegroups.com, cn@c14sw.de at cn@c14sw.de wrote on 05/14/2005 03:02:> robert bristow-johnson schrieb: >>> most formulas I've seen go >>> Cepstrum = FFT( Log(Abs(FFT(WindowedSignal)))) >>> or >>> Cepstrum = IFFT( Log(Abs(FFT(WindowedSignal)))), >> >> are you sure about the Abs? > > Perhaps Magnitude is the correct term.i don't think it belongs there at all, however you name it. i recognize it as the "real cepstrum", but if you're gonna apply them theorems that you see in the back of O&S regarding cepstrum (such as "homomorphic deconvolution of speech" to get the periodic driving function, and thus the period), it looks like to me that you need the "complex cepstrum". being that i never did anything with cepstrum since grad school, i can be wrong about that.>>> I think the log-stage makes a difference, though I can't remember >>> why. Does somebody else know? >> >> i can't say exactly what difference it makes, but i thought the idea was to >> change a convolution operation into one of addition. if WindowedSignal is >> the convolution of two sequences, FFT(WindowedSignal) will be the product of >> the FFTs of the two sequences, Log(FFT(WindowedSignal)) will be the sum of >> the Logs of the FFTs of the two sequences, etc. > > Logarithms are so hard to digest,why? log(A*B) = log(A) + log(B) . big deel.> but thanks for the hint, > now i can google up the rest.maybe borrow Oppenheim and Schafer if you don't have it. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by ●May 14, 20052005-05-14
In article <jc-dnQ-Dd59MGhnfRVn-iA@giganews.com>, smuglr <d.mcgilvray@elec.gla.ac.uk> wrote:>>I'm currently using auto-correlation to find an estimate or the >>fundamental frequency, and then using phase-unwrapping to get high >>resolution. > >Meant to say Cepstrum, not auto-correlation - oops!What window width (in time, milliseconds) are you using with respect to the lowest frequency of interest? I assume that you are doing phase unwrapping using successive frames (overlapped or not?), which means you have some historical data with which to try processing larger frames. If latency is an issue, you might be able to use resampling to increase the frame size by less than a factor of 2X. Does your error rate at a given fundamental frequency vary with frame size? Also, are you looking at the magnitude or real part of your real Cepstrum? e.g. Re(ifft(log(mag(fft(x))))) or mag(ifft(log(mag(fft(x))))) IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.
Reply by ●May 14, 20052005-05-14
robert bristow-johnson wrote:> in article 1116054165.853447.127780@z14g2000cwz.googlegroups.com, > cn@c14sw.de at cn@c14sw.de wrote on 05/14/2005 03:02: > > > robert bristow-johnson schrieb: > >>> most formulas I've seen go > >>> Cepstrum = FFT( Log(Abs(FFT(WindowedSignal)))) > >>> or > >>> Cepstrum = IFFT( Log(Abs(FFT(WindowedSignal)))), > >> > >> are you sure about the Abs? > > > > Perhaps Magnitude is the correct term. > > i don't think it belongs there at all, however you name it. irecognize it> as the "real cepstrum", but if you're gonna apply them theorems thatyou see> in the back of O&S regarding cepstrum (such as "homomorphicdeconvolution of> speech" to get the periodic driving function, and thus the period),it looks> like to me that you need the "complex cepstrum". being that i neverdid> anything with cepstrum since grad school, i can be wrong about that.I don't know much 'bout speech processing but I did look into the cepstrum a couple of years ago. The problem with the complex cepstrum is phase unwrapping. If you can get away by using the real cepstrum, do that.> >>> I think the log-stage makes a difference, though I can't remember > >>> why. Does somebody else know? > >> > >> i can't say exactly what difference it makes, but i thought theidea was to> >> change a convolution operation into one of addition. ifWindowedSignal is> >> the convolution of two sequences, FFT(WindowedSignal) will be theproduct of> >> the FFTs of the two sequences, Log(FFT(WindowedSignal)) will bethe sum of> >> the Logs of the FFTs of the two sequences, etc. > > > > Logarithms are so hard to digest, > > why? log(A*B) = log(A) + log(B) . big deel. > > > but thanks for the hint, > > now i can google up the rest. > > maybe borrow Oppenheim and Schafer if you don't have it.Make sure you get the 1975 book. That's the only general DSP book that goes into the cepstrum in any depth. Probably because Oppenheim wrote his PhD thesis on cepstra in the early 1970ies. Rune
Reply by ●May 14, 20052005-05-14
in article 1116103703.918796.202050@g49g2000cwa.googlegroups.com, Rune Allnor at allnor@tele.ntnu.no wrote on 05/14/2005 16:48:> I don't know much 'bout speech processing but I did look into the > cepstrum a couple of years ago. The problem with the complex cepstrum > is phase unwrapping.it's not such a big deal: arg{ X(k+1) } = arg{ X(k) } + arg{ X(k+1)/X(k) } = arg( X(k) } + arctan( Im{X(k+1)/X(k)} / Re{X(k+1)/X(k)} ) = arg( X(k) } + arctan( ( Im{X(k+1)}*Re{X(k)} + Re{X(k+1)}*Im{X(k)} )/ ( Re{X(k+1)}*Re{X(k)} + Im{X(k+1)}*Im{X(k)} ) ) i think that's right, ain't it? often, all we need is the derivative or the difference and that's pretty secure.> If you can get away by using the real cepstrum, do that.i occasionally get into arguments about whether or not you can neglect phase in audio signals. dunno if it's okay for speech, but i don't think it's a good idea for generalized audio signals. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by ●May 14, 20052005-05-14
in article BEABF289.7478%rbj@audioimagination.com, robert bristow-johnson at rbj@audioimagination.com wrote on 05/14/2005 18:23:> arg{ X(k+1) } = arg{ X(k) } + arg{ X(k+1)/X(k) } > > = arg( X(k) } + arctan( Im{X(k+1)/X(k)} / Re{X(k+1)/X(k)} ) > > = arg( X(k) } > > + arctan( ( Im{X(k+1)}*Re{X(k)} + Re{X(k+1)}*Im{X(k)} )/ > ( Re{X(k+1)}*Re{X(k)} + Im{X(k+1)}*Im{X(k)} ) ) > > i think that's right, ain't it? >not quite. dropped a sign. arg{ X(k+1) } = arg{ X(k) } + arg{ X(k+1)/X(k) } = arg( X(k) } + arctan( Im{X(k+1)/X(k)} / Re{X(k+1)/X(k)} ) = arg( X(k) } + arctan( ( Im{X(k+1)}*Re{X(k)} - Re{X(k+1)}*Im{X(k)} )/ ( Re{X(k+1)}*Re{X(k)} + Im{X(k+1)}*Im{X(k)} ) ) i think that's it. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by ●May 15, 20052005-05-15
robert bristow-johnson wrote:> in article 1116103703.918796.202050@g49g2000cwa.googlegroups.com,Rune> Allnor at allnor@tele.ntnu.no wrote on 05/14/2005 16:48: > > > I don't know much 'bout speech processing but I did look into the > > cepstrum a couple of years ago. The problem with the complexcepstrum> > is phase unwrapping. > > it's not such a big deal: > > arg{ X(k+1) } = arg{ X(k) } + arg{ X(k+1)/X(k) } > > = arg( X(k) } + arctan( Im{X(k+1)/X(k)} /Re{X(k+1)/X(k)} )> > = arg( X(k) } > > + arctan( ( Im{X(k+1)}*Re{X(k)} +Re{X(k+1)}*Im{X(k)} )/> ( Re{X(k+1)}*Re{X(k)} +Im{X(k+1)}*Im{X(k)} ) )> > i think that's right, ain't it?It looks OK. The problem is, according to O&S, that the cepstrum needs to have a certain form (real-valued) and so the log|X(w)| and arg{X(w)} terms need to be Hilbert transform pairs. In order to do that, both need to be analytic (i.e. continuous) on the unit circle in z domain. Taking a closer look at the phase, it is clearly ambiguous: arg{X(w)} = phi(w) + k*2*pi where k is any integer. So in addition to doing the arctan thing, you need to find a k that gets the phase function over the spectrum to meet the requirements of being analytic. There was an IEEE paper on such issues by Tribolet in 1977, I think. Check with IEEExplore.> often, all we need is the derivative or the difference and that'spretty> secure.I don't know. I guess this depends on the application. The problem I found, that made me abandon the cepstrum for the applications I had in mind, was that the derivative of the phase governs causality of the signal. The group delay is given as v_g(w) = -d arg{X(w)}/dw and for a continuous, periodic, non-constant phase function v_g(w), the group delay must necessarily become negative in some frequency range. So I ended up with non-causal signals where I didn't want any.> > If you can get away by using the real cepstrum, do that. > > i occasionally get into arguments about whether or not you canneglect phase> in audio signals. dunno if it's okay for speech, but i don't thinkit's a> good idea for generalized audio signals.Well, it certailnly wasn't a good idea for the non-speech signals I had in mind. Rune