comp.dsp | Pitch detection in voice (singing)| page 2

Reply by axlq ●May 13, 20052005-05-13

In article <W_ednVz8vJ1FHhnfRVn-tQ@giganews.com>,
smuglr <d.mcgilvray@elec.gla.ac.uk> wrote:
>I know the general subject of pitch detection has been flogged to death,
>but I am looking specifically for pitch detection in a sung melody.

I'm wondering how that will work with some opera singers, who have such
a heavy vibrato in their voice that there's no detectable constant
pitch, rather a modulated frequency.

(While I like opera, I also never understood why they need to sing
that way.  Can't they hold a note?)

-A

Reply by robert bristow-johnson ●May 13, 20052005-05-13

in article 1115999093.675318.252040@g43g2000cwa.googlegroups.com,
cn@c14sw.de at cn@c14sw.de wrote on 05/13/2005 11:44:

> most formulas I've seen go
> Cepstrum = FFT( Log(Abs(FFT(WindowedSignal))))
> or
> Cepstrum = IFFT( Log(Abs(FFT(WindowedSignal)))),

are you sure about the Abs?

> I think the log-stage makes a difference, though I can't remember
> why.
> Does somebody else know?
> 

i can't say exactly what difference it makes, but i thought the idea was to
change a convolution operation into one of addition.  if WindowedSignal is
the convolution of two sequences, FFT(WindowedSignal) will be the product of
the FFTs of the two sequences, Log(FFT(WindowedSignal)) will be the sum of
the Logs of the FFTs of the two sequences, etc.
-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by ●May 14, 20052005-05-14

robert bristow-johnson schrieb:
> > most formulas I've seen go
> > Cepstrum = FFT( Log(Abs(FFT(WindowedSignal))))
> > or
> > Cepstrum = IFFT( Log(Abs(FFT(WindowedSignal)))),
>
> are you sure about the Abs?

Perhaps Magnitude is the correct term.


> > I think the log-stage makes a difference, though I can't remember
> > why.
> > Does somebody else know?
> >
>
> i can't say exactly what difference it makes, but i thought the idea
was to
> change a convolution operation into one of addition.  if
WindowedSignal is
> the convolution of two sequences, FFT(WindowedSignal) will be the
product of
> the FFTs of the two sequences, Log(FFT(WindowedSignal)) will be the
sum of
> the Logs of the FFTs of the two sequences, etc.

Logarithms are so hard to digest,
but thanks for the hint,
now i can google up the rest. 


Carsten Neubauer
http://www.c14sw.de

Reply by MA ●May 14, 20052005-05-14

Hi!

When you take IDFT of the log log of the magnitude of the DFT you get the
real cepstrum. If X(k)=|X(k)|e^jphi(k), then log(X(k))=log|X(k)|+jphi(k).
The IDFT of this is the complex cepstrum. I dont think the phase is
important in the pitch detection with the cepstrum since we're looking for
periodic patterns in the magnitude spectrum.

I used the real cepstrum in a pitch detector recently while doing a
project course at school. The resolution is limited by the DFT length, but
the estimate can be improved by using the phase from the DFT bin which has
been found with the cepstrum, and the phase from the same DFT bin in the
previous window. 

Perhaps this is old news... I'm just a student who want to "talk the
talk";)

/ M
		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Reply by robert bristow-johnson ●May 14, 20052005-05-14

in article 1116054165.853447.127780@z14g2000cwz.googlegroups.com,
cn@c14sw.de at cn@c14sw.de wrote on 05/14/2005 03:02:

> robert bristow-johnson schrieb:
>>> most formulas I've seen go
>>> Cepstrum = FFT( Log(Abs(FFT(WindowedSignal))))
>>> or
>>> Cepstrum = IFFT( Log(Abs(FFT(WindowedSignal)))),
>> 
>> are you sure about the Abs?
> 
> Perhaps Magnitude is the correct term.

i don't think it belongs there at all, however you name it.  i recognize it
as the "real cepstrum", but if you're gonna apply them theorems that you see
in the back of O&S regarding cepstrum (such as "homomorphic deconvolution of
speech" to get the periodic driving function, and thus the period), it looks
like to me that you need the "complex cepstrum".  being that i never did
anything with cepstrum since grad school, i can be wrong about that.

>>> I think the log-stage makes a difference, though I can't remember
>>> why.  Does somebody else know?
>> 
>> i can't say exactly what difference it makes, but i thought the idea was to
>> change a convolution operation into one of addition.  if WindowedSignal is
>> the convolution of two sequences, FFT(WindowedSignal) will be the product of
>> the FFTs of the two sequences, Log(FFT(WindowedSignal)) will be the sum of
>> the Logs of the FFTs of the two sequences, etc.
> 
> Logarithms are so hard to digest,

why?  log(A*B) = log(A) + log(B) .  big deel.

> but thanks for the hint,
> now i can google up the rest.

maybe borrow Oppenheim and Schafer if you don't have it.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by Ronald H. Nicholson Jr. ●May 14, 20052005-05-14

In article <jc-dnQ-Dd59MGhnfRVn-iA@giganews.com>,
smuglr <d.mcgilvray@elec.gla.ac.uk> wrote:
>>I'm currently using auto-correlation to find an estimate or the
>>fundamental frequency, and then using phase-unwrapping to get high
>>resolution. 
>
>Meant to say Cepstrum, not auto-correlation - oops!

What window width (in time, milliseconds) are you using with respect
to the lowest frequency of interest?  

I assume that you are doing phase unwrapping using successive frames
(overlapped or not?), which means you have some historical data with
which to try processing larger frames.  If latency is an issue, you
might be able to use resampling to increase the frame size by less than
a factor of 2X.  Does your error rate at a given fundamental frequency
vary with frame size?

Also, are you looking at the magnitude or real part of your real Cepstrum?
e.g.
Re(ifft(log(mag(fft(x)))))
or
mag(ifft(log(mag(fft(x)))))

IMHO. YMMV.
-- 
Ron Nicholson   rhn AT nicholson DOT com   http://www.nicholson.com/rhn/ 
#include <canonical.disclaimer>        // only my own opinions, etc.

Reply by Rune Allnor ●May 14, 20052005-05-14

robert bristow-johnson wrote:
> in article 1116054165.853447.127780@z14g2000cwz.googlegroups.com,
> cn@c14sw.de at cn@c14sw.de wrote on 05/14/2005 03:02:
>
> > robert bristow-johnson schrieb:
> >>> most formulas I've seen go
> >>> Cepstrum = FFT( Log(Abs(FFT(WindowedSignal))))
> >>> or
> >>> Cepstrum = IFFT( Log(Abs(FFT(WindowedSignal)))),
> >>
> >> are you sure about the Abs?
> >
> > Perhaps Magnitude is the correct term.
>
> i don't think it belongs there at all, however you name it.  i
recognize it
> as the "real cepstrum", but if you're gonna apply them theorems that
you see
> in the back of O&S regarding cepstrum (such as "homomorphic
deconvolution of
> speech" to get the periodic driving function, and thus the period),
it looks
> like to me that you need the "complex cepstrum".  being that i never
did
> anything with cepstrum since grad school, i can be wrong about that.

I don't know much 'bout speech processing but I did look into the
cepstrum a couple of years ago. The problem with the complex cepstrum
is phase unwrapping. If you can get away by using the real cepstrum,
do that.

> >>> I think the log-stage makes a difference, though I can't remember
> >>> why.  Does somebody else know?
> >>
> >> i can't say exactly what difference it makes, but i thought the
idea was to
> >> change a convolution operation into one of addition.  if
WindowedSignal is
> >> the convolution of two sequences, FFT(WindowedSignal) will be the
product of
> >> the FFTs of the two sequences, Log(FFT(WindowedSignal)) will be
the sum of
> >> the Logs of the FFTs of the two sequences, etc.
> >
> > Logarithms are so hard to digest,
>
> why?  log(A*B) = log(A) + log(B) .  big deel.
>
> > but thanks for the hint,
> > now i can google up the rest.
>
> maybe borrow Oppenheim and Schafer if you don't have it.

Make sure you get the 1975 book. That's the only general DSP book that
goes into the cepstrum in any depth. Probably because Oppenheim wrote
his PhD thesis on cepstra in the early 1970ies.

Rune

Reply by robert bristow-johnson ●May 14, 20052005-05-14

in article 1116103703.918796.202050@g49g2000cwa.googlegroups.com, Rune
Allnor at allnor@tele.ntnu.no wrote on 05/14/2005 16:48:

> I don't know much 'bout speech processing but I did look into the
> cepstrum a couple of years ago. The problem with the complex cepstrum
> is phase unwrapping.

it's not such a big deal:

   arg{ X(k+1) } = arg{ X(k) } + arg{ X(k+1)/X(k) }

                 = arg( X(k) } + arctan( Im{X(k+1)/X(k)} / Re{X(k+1)/X(k)} )

                 = arg( X(k) }

                 + arctan( ( Im{X(k+1)}*Re{X(k)} + Re{X(k+1)}*Im{X(k)} )/
                           ( Re{X(k+1)}*Re{X(k)} + Im{X(k+1)}*Im{X(k)} ) )

i think that's right, ain't it?

often, all we need is the derivative or the difference and that's pretty
secure.

> If you can get away by using the real cepstrum, do that.

i occasionally get into arguments about whether or not you can neglect phase
in audio signals.  dunno if it's okay for speech, but i don't think it's a
good idea for generalized audio signals.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by robert bristow-johnson ●May 14, 20052005-05-14

in article BEABF289.7478%rbj@audioimagination.com, robert bristow-johnson at
rbj@audioimagination.com wrote on 05/14/2005 18:23:

> arg{ X(k+1) } = arg{ X(k) } + arg{ X(k+1)/X(k) }
> 
>               = arg( X(k) } + arctan( Im{X(k+1)/X(k)} / Re{X(k+1)/X(k)} )
> 
>               = arg( X(k) }
> 
>                   + arctan( ( Im{X(k+1)}*Re{X(k)} + Re{X(k+1)}*Im{X(k)} )/
>                             ( Re{X(k+1)}*Re{X(k)} + Im{X(k+1)}*Im{X(k)} ) )
> 
> i think that's right, ain't it?
> 

not quite.  dropped a sign.

   arg{ X(k+1) } = arg{ X(k) } + arg{ X(k+1)/X(k) }

                 = arg( X(k) } + arctan( Im{X(k+1)/X(k)} / Re{X(k+1)/X(k)} )

                 = arg( X(k) }

                   + arctan( ( Im{X(k+1)}*Re{X(k)} - Re{X(k+1)}*Im{X(k)} )/
                             ( Re{X(k+1)}*Re{X(k)} + Im{X(k+1)}*Im{X(k)} ) )

i think that's it.


-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by Rune Allnor ●May 15, 20052005-05-15

robert bristow-johnson wrote:
> in article 1116103703.918796.202050@g49g2000cwa.googlegroups.com,
Rune
> Allnor at allnor@tele.ntnu.no wrote on 05/14/2005 16:48:
>
> > I don't know much 'bout speech processing but I did look into the
> > cepstrum a couple of years ago. The problem with the complex
cepstrum
> > is phase unwrapping.
>
> it's not such a big deal:
>
>    arg{ X(k+1) } = arg{ X(k) } + arg{ X(k+1)/X(k) }
>
>                  = arg( X(k) } + arctan( Im{X(k+1)/X(k)} /
Re{X(k+1)/X(k)} )
>
>                  = arg( X(k) }
>
>                  + arctan( ( Im{X(k+1)}*Re{X(k)} +
Re{X(k+1)}*Im{X(k)} )/
>                            ( Re{X(k+1)}*Re{X(k)} +
Im{X(k+1)}*Im{X(k)} ) )
>
> i think that's right, ain't it?

It looks OK.

The problem is, according to O&S, that the cepstrum needs to
have a certain form (real-valued) and so the log|X(w)| and
arg{X(w)} terms need to be Hilbert transform pairs. In order
to do that, both need to be analytic (i.e. continuous) on the
unit circle in z domain.

Taking a closer look at the phase, it is clearly ambiguous:

arg{X(w)} = phi(w) + k*2*pi

where k is any integer. So in addition to doing the arctan thing,
you need to find a k that gets the phase function over the spectrum
to meet the requirements of being analytic.

There was an IEEE paper on such issues by Tribolet in 1977, I think.
Check with IEEExplore.

> often, all we need is the derivative or the difference and that's
pretty
> secure.

I don't know. I guess this depends on the application. The problem
I found, that made me abandon the cepstrum for the applications I
had in mind, was that the derivative of the phase governs causality
of the signal.

The group delay is given as

   v_g(w) = -d arg{X(w)}/dw

and for a continuous, periodic, non-constant phase function v_g(w),
the group delay must necessarily become negative in some frequency
range. So I ended up with non-causal signals where I didn't want any.

> > If you can get away by using the real cepstrum, do that.
>
> i occasionally get into arguments about whether or not you can
neglect phase
> in audio signals.  dunno if it's okay for speech, but i don't think
it's a
> good idea for generalized audio signals.

Well, it certailnly wasn't a good idea for the non-speech signals
I had in mind.

Rune

Previous 123 Next

Pitch detection in voice (singing)

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group