DSPRelated.com
Forums

Pitch Estimation using Autocorrelation

Started by olivers September 7, 2005
Rune Allnor wrote:

> If it is "obvious" to an expert, true or self- > proclaimed, on speech processing that the pitch is encoded > in the time-domain autocorrelation function, it is not at > all so for the generalist or somebody who have other fields > of interest.
Yesterday I checked out some of my books. It turns out that the model for speech includes a source comprising an impulse train and a formant filter. If I am right in the impression that the period between pulses determines the pitch of the speach, I agree that the pitch of the speech ought to be possible to extract from the time-domain autocorrelation function. This would be a consequence of the regular pulse pattern of the speech signal, that as far as I can tell is all but unique to speech signals. Rune
> It turns out that the model for speech includes a source > comprising an impulse train and a formant filter. If I am > right in the impression that the period between pulses > determines the pitch of the speach, I agree that the pitch > of the speech ought to be possible to extract from the > time-domain autocorrelation function. > > This would be a consequence of the regular pulse pattern > of the speech signal, that as far as I can tell is all but > unique to speech signals.
This is true for voiced speech but not for unvoiced speech (which is characterized by noiselike excitation caused by airflow through a narrow constriction).
Glad you got out your books on speech.  Now get them out for
musical acoustics.   The exact same paradigm of a periodic,
usually full of harmonics (depends which ones on the
instrument) or aperiodic source (usually whitish noise)
driving a cavity resonance exists for all acoustical
instruments whether it be voice, violin, trumpet, drum, or
piano.  It is not unique to voice, it is the fundamental
principle of all sound created by wo/man.

Chip Wood


"Rune Allnor" <allnor@tele.ntnu.no> wrote in message
news:1126424159.755462.120720@f14g2000cwb.googlegroups.com...
> > Rune Allnor wrote: > Yesterday I checked out some of my books. > > It turns out that the model for speech includes a source > comprising an impulse train and a formant filter. If I am > right in the impression that the period between pulses > determines the pitch of the speach, I agree that the pitch > of the speech ought to be possible to extract from the > time-domain autocorrelation function. >
, that as far as I can tell is all but
> unique to speech signals. > > Rune >
I understand that many people outside of speech/audio processing area
get easily confused when a discussion on pitch starts.
While it is certainly understandable for ordinary folks and novices, it
completely amazes me that some people in academia spend their entire
successful careers (in terms of number of published papers) and retire
as distinguished professors while still being totally confused about
the subject.

And I am not even talking about more complex and less intuitive matters
like time-domain vs. short-term vs. frequency-domain analysis
techniques and time-frequency resolution vs. uncertainty principle as
applied to signal processing, etc. etc.

For starters I can suggest automatically substituting "pitch" with
"fundamental frequency" or, better yet, "fundamental period"
(or "glottal period", if you want), wherever you see a discussion
related to speech processing.
This will greatly reduce the amount of confusion.

I understand that many people outside of speech/audio processing area
get easily confused when a discussion on pitch starts.
While it is certainly understandable for ordinary folks and novices, it
completely amazes me that some people in academia spend their entire
successful careers (in terms of number of published papers) and retire
as distinguished professors while still being totally confused about
the subject.

And I am not even talking about more complex and less intuitive matters
like time-domain vs. short-term vs. frequency-domain analysis
techniques and time-frequency resolution vs. uncertainty principle as
applied to signal processing, etc. etc.

For starters I can suggest automatically substituting "pitch" with
"fundamental frequency" or, better yet, "fundamental period"
(or "glottal period", if you want), wherever you see a discussion
related to speech processing.
This will greatly reduce the amount of confusion.

fizteh89 wrote:
> For starters I can suggest automatically substituting "pitch" with > "fundamental frequency" or, better yet, "fundamental period" > (or "glottal period", if you want), wherever you see a discussion > related to speech processing. > This will greatly reduce the amount of confusion.
I am not sure that "pitch" and fundamental frequency or period should be considered identical. I prefer to use the term pitch in reference to music or sound perception. But there can be frequency components in an audio signal which are not normally perceived as pitch (masked frequency bands, sub-harmonics, beating, etc.) And vice-versa (making a tune by playing back sound samples of an car crash at different sample rates). IMHO. YMMV. -- rhn A.T nicholson D.o.T c-O-m
fizteh89 wrote:
> I understand that many people outside of speech/audio processing area > get easily confused when a discussion on pitch starts. > While it is certainly understandable for ordinary folks and novices, it > completely amazes me that some people in academia spend their entire > successful careers (in terms of number of published papers) and retire > as distinguished professors while still being totally confused about > the subject. > > And I am not even talking about more complex and less intuitive matters > like time-domain vs. short-term vs. frequency-domain analysis > techniques and time-frequency resolution vs. uncertainty principle as > applied to signal processing, etc. etc. > > For starters I can suggest automatically substituting "pitch" with > "fundamental frequency" or, better yet, "fundamental period" > (or "glottal period", if you want), wherever you see a discussion > related to speech processing. > This will greatly reduce the amount of confusion.
Einstein reminded us that phenomena should be described as simply as possible, but not more simply than that. It is possible to claim that a falling tree makes no sound if there is no listener, but that complicates descriptions of events. Likewise, if one equates pitch to fundamental frequency, then one must invent "perceived pitch" which is not the same thing. R.B-J.'s example of a second harmonic at 0 dB and a fundamental at -60 is an example of perceived pitch -- simpler to just call it pitch -- being higher than the fundamental; synthetic bass is one where the pitch is lower. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Einstein reminded us of many things... One of my favorite quotes is:
"A practical profession is a salvation for a man of my type; an
academic career compels a young man to scientific production, and only
strong characters can resist the temptation of superficial analysis."

You are mixing subjective, psychoacoustic, definitions with objective,
computer-measurable ones.
Human hearing is a rather peculiar instrument and can be easily fooled,
no doubt.
(Have you heard about Huggins pitch or Fourcin pitch?)
But does it have much to do with tracking exact period of a constantly
changing voice signal in a low-bit-rate vocoder or a pitch-synchronous
front-end feature extractor for a speech-recognition application?

fizteh89 wrote:
> Einstein reminded us of many things... One of my favorite quotes is: > "A practical profession is a salvation for a man of my type; an > academic career compels a young man to scientific production, and only > strong characters can resist the temptation of superficial analysis." > > You are mixing subjective, psychoacoustic, definitions with objective, > computer-measurable ones.
I'm not mixing them, but disentangling them. Either we have pitch and frequency as synonyms (so requiring perceived pitch as a distinction) or re assign objective, computer-measurable attributes to frequency and subjective and psychoacoustic attributes to pitch, which many people do anyway.
> Human hearing is a rather peculiar instrument and can be easily fooled, > no doubt. > (Have you heard about Huggins pitch or Fourcin pitch?)
I assume that effects that are manifest only binaurally are outside what I thing this discussion is about.
> But does it have much to do with tracking exact period of a constantly > changing voice signal in a low-bit-rate vocoder or a pitch-synchronous > front-end feature extractor for a speech-recognition application?
I don't know. I imagine that what a human can't hear is not important for reproducing speech, but it might be important for distinguishing speakers. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Hey guys, it is not rocket science.  "Pitch' and "loudness"
are terms of perception that are measurable only by asking a
human.  "Fundamental frequency" (or its inverse "Fundamental
period"), and "Sound Pressure Level (SPL)" are physical
entities measurable by machines.  There is much correlation
(basically a log function) between each pair, BUT many other
elements also factor in to their relationships.  As I said
before, many a Ph.D dissertation have been written and many
full professorships have been attained studying these other
relationships.

"Heavy" and "weight" are another pair.  "Hot"/"cold" vs
"temperature".  "Color" vs "spectra".  This is all
Perception 101!  And if you want to work or even dabble in
the world of humans' response to physical events and don't
know S.S. Stevens, you should!  He invented psychophysics
back in the 1930s.

The good engineer and scientist makes these leaps back and
forth between perception and physical almost unconsciously
and may occasionally use the wrong term (as I myself have
been guilty of) in front of naive listeners, but they are
distinctly different terms with very different meanings.

The tree in the forest w/o a human to hear?- It had no
loudness , but high SPL.  My wife gave me a t-shirt that
reads:  "If a man makes a statement in the forest and there
is no woman to hear, is he still wrong?"

Want to start another discussion-  What is sound "quality"?

-- 
Chip Wood

"Jerry Avins" <jya@ieee.org> wrote in message
news:qP-dnUN44fkIgLreRVn-2w@rcn.net...
> fizteh89 wrote: > > I understand that many people outside of speech/audio
processing area
> > get easily confused when a discussion on pitch starts. > > While it is certainly understandable for ordinary folks
and novices, it
> > completely amazes me that some people in academia spend
their entire
> > successful careers (in terms of number of published
papers) and retire
> > as distinguished professors while still being totally
confused about
> > the subject.