Forums

Pitch calculation (FFT or autocorrelation)

Started by gjunge June 5, 2006
Hi, 

I want to show a Pitch graph/curve of a wave file. I saw that good results
can be made by using AutoCorrelation. I have the autocorrelation function
and I also have the FFT function, but I don't know what I should put into
the autocorrelation function in order to get a pitch line. 
I have a wave file (spoken text), and also have the FFT results of the
wave file, how can I extract from this data a nice Pitch curve. The
purpose of the pitch line is to show the intonations of the text. 

If anybody could explain to me the steps to be taken in plain english,
pseudo code, or C#, VB(.NET) or Java, I would be very very thankfull.

Many Thanks, 
Gidon 


gjunge wrote:
> Hi, > > I want to show a Pitch graph/curve of a wave file. I saw that good results > can be made by using AutoCorrelation. I have the autocorrelation function > and I also have the FFT function, but I don't know what I should put into > the autocorrelation function in order to get a pitch line. > I have a wave file (spoken text), and also have the FFT results of the > wave file, how can I extract from this data a nice Pitch curve. The > purpose of the pitch line is to show the intonations of the text. > > If anybody could explain to me the steps to be taken in plain english, > pseudo code, or C#, VB(.NET) or Java, I would be very very thankfull. > > Many Thanks, > Gidon
I am not sure what you are asking. Do you want pitch as a function of time?
Neither FFT nor autocorrelation will do nowadays...

Pitch detection problem was laid to rest a few years ago...

Check this page: http://www.soundmathtech.com/pitch

And read US Patent Application 20030088401 at www.uspto.gov/patft
(Strictly non-commercial license applies)

That's what they use for pitch (fundamental frequency) detection
nowadays, but don't ask - nobody will admit it :-)


gjunge wrote:
> Hi, > > I want to show a Pitch graph/curve of a wave file. I saw that good results > can be made by using AutoCorrelation. I have the autocorrelation function > and I also have the FFT function, but I don't know what I should put into > the autocorrelation function in order to get a pitch line. > I have a wave file (spoken text), and also have the FFT results of the > wave file, how can I extract from this data a nice Pitch curve. The > purpose of the pitch line is to show the intonations of the text. > > If anybody could explain to me the steps to be taken in plain english, > pseudo code, or C#, VB(.NET) or Java, I would be very very thankfull. > > Many Thanks, > Gidon
fizteh89 wrote:
> Neither FFT nor autocorrelation will do nowadays... > > Pitch detection problem was laid to rest a few years ago... > > Check this page: http://www.soundmathtech.com/pitch
this website says: "Currently the demonstration software is provided in the form of several Matlab P-files. Download the compressed ZIP-archive pdemo.zip (151 Kb) Unpack it in the Matlab work directory. Then type pdemo at the Matlab command prompt. You should see the GUI window coming up. You can now load and process speech files in wav or ascii formats (Clean speech sampled at 16 kHz is assumed)." i downloaded pdemo and unzipped to a PC with MATLAB, changed the directory, and typed in pdemo at the prompt. this is what i get: =BB pwd ans =3D C:\Program Files\MATLAB\pdemo =BB dir . nhist.p selpeaks.p xdistall.p .=2E pdemo.p shehadyourdark.wav embed.p pframe.p showframe.p histsm.p ptrack.p xdist.p =BB pdemo ??? c:\program files\matlab\pdemo\pdemo.p is not a MATLAB P-file. is there something i'm doing wrong? it doesn't work. i've read your paper, you have read my response to it years ago: http://groups.google.com/group/comp.dsp/browse_thread/thread/5f5b466f059eb9= ce http://groups.google.com/group/comp.dsp/msg/9daa9e1f9a0dcba0 and i have yet to hear how well this pitch detector works and sounds.
> And read US Patent Application 20030088401 at www.uspto.gov/patft > (Strictly non-commercial license applies) > > That's what they use for pitch (fundamental frequency) detection > nowadays, but don't ask - nobody will admit it :-)
who is "they"? if there are all sorts of unidentified people or companies using your algorithm that will not say so, what meaning can we ascribe to the statement above? Dmitry, i am not writing your PDA off, i just haven't been able to see anything concrete other than your paper http://www.soundmathtech.com/pitch/paper.html and the issues i have brought up regarding that have not been addressed. best regards, r b-j
Robert, not to offend you or anybody else in this group: by "they" I
certainly didn't mean some regulars here or some mom-and-pop workshops
out there.
"They" know who they are and what they are doing all too well...

I just downloaded and unzipped the code and it works fine for me, aside
from giving some warning messages. You need Windows version of Matlab
though and mine is pretty old, but it should be backward compatible.
I am sure other people out there were able to run the demo. "They" also
know how to search IFW of a pending US patent application to get all
the public info, including the source code submitted with provisional
application...
BTW, I posted some Matlab source here, in comp.dsp, some time ago.

As for your "unanswered questions"...,well, you promised to come up
with some waveform to fool my PDA, but the truth is: there is no such
waveform.
ANY (benigh) periodic waveform you can come up with will produce
perfect pitch peaks - at 1, 2, 3... pitch periods - an "ideal" pitch
detector indeed.
This is as good as it gets - there will never be anything better for
pitch/fundamental period detection.
And your favorite example of mixing very weak 100 Hz sine wave with
strong 200 Hz sine wave just proves the validity of the method: the
true fundamental frequency is 100 Hz and the method is more sensitive
than human ear, subject only to a graceful degradation of accuracy with
increasing signal-to-noise ratio...

I hope that I don't sound paternalistic or arrogant. After all, it's
just a very small (yet essential) part of DSP and I DID spent a LOT of
time and effort to work it out in sufficient detail for DSP
practitioners.



robert bristow-johnson wrote:
> fizteh89 wrote: > > Neither FFT nor autocorrelation will do nowadays... > > > > Pitch detection problem was laid to rest a few years ago... > > > > Check this page: http://www.soundmathtech.com/pitch > > this website says: > > "Currently the demonstration software is provided in the form of > several Matlab P-files. > > Download the compressed ZIP-archive pdemo.zip (151 Kb) > > Unpack it in the Matlab work directory. Then type pdemo at the Matlab > command prompt. You should see the GUI window coming up. You can now > load and process speech files in wav or ascii formats (Clean speech > sampled at 16 kHz is assumed)." > > > > > i downloaded pdemo and unzipped to a PC with MATLAB, changed the > directory, and typed in pdemo at the prompt. this is what i get: > > =BB pwd > > ans =3D > > C:\Program Files\MATLAB\pdemo > > =BB dir > > . nhist.p selpeaks.p xdistall.p > > .. pdemo.p shehadyourdark.wav > embed.p pframe.p showframe.p > histsm.p ptrack.p xdist.p > > =BB pdemo > ??? c:\program files\matlab\pdemo\pdemo.p is not a MATLAB P-file. > > is there something i'm doing wrong? it doesn't work. > > i've read your paper, you have read my response to it years ago: > > http://groups.google.com/group/comp.dsp/browse_thread/thread/5f5b466f059e=
b9ce
> http://groups.google.com/group/comp.dsp/msg/9daa9e1f9a0dcba0 > > and i have yet to hear how well this pitch detector works and sounds. > > > And read US Patent Application 20030088401 at www.uspto.gov/patft > > (Strictly non-commercial license applies) > > > > That's what they use for pitch (fundamental frequency) detection > > nowadays, but don't ask - nobody will admit it :-) > > who is "they"? if there are all sorts of unidentified people or > companies using your algorithm that will not say so, what meaning can > we ascribe to the statement above? > > Dmitry, i am not writing your PDA off, i just haven't been able to see > anything concrete other than your paper > http://www.soundmathtech.com/pitch/paper.html and the issues i have > brought up regarding that have not been addressed. >=20 > best regards, >=20 > r b-j
fizteh89 wrote:
> Robert, not to offend you or anybody else in this group: by "they" I > certainly didn't mean some regulars here or some mom-and-pop workshops > out there. > "They" know who they are and what they are doing all too well...
impressive.
> I just downloaded and unzipped the code and it works fine for me, aside > from giving some warning messages. You need Windows version of Matlab
i wrote:
> > i downloaded pdemo and unzipped to a PC with MATLAB, changed the > > directory, and typed in pdemo at the prompt. ...
> though and mine is pretty old, but it should be backward compatible.
and MATLAB preaches strict backward compatibility. why does my windows version of MATLAB say the following?:
> > =BB pdemo > > ??? c:\program files\matlab\pdemo\pdemo.p is not a MATLAB P-file.
i'm not averse to fixing something in my MATLAB to get it to work, but i have no idea why it would do that. i download, unzip, and run it just as you web page says, and i can't get it to work.
> I am sure other people out there were able to run the demo. "They" also > know how to search IFW of a pending US patent application to get all > the public info, including the source code submitted with provisional > application... > BTW, I posted some Matlab source here, in comp.dsp, some time ago.
i started at http://groups.google.com/groups/profile?enc_user=3DriwiGBQAAAC5hnaoSNtzWu2W= z9LFDnOUOPANdqfI6prRsqjc7uCt1A and couldn't find it. i found some interesting quotes: Re: 23k$US for Matlab Embedded DSP development? fizteh89 wrote:
> The world is not flat as far as local girls and local restaurants are > concerned. > > But, of course, some people prefer cyberfood or cybersex...
> As for your "unanswered questions"...,well, you promised to come up > with some waveform to fool my PDA, but the truth is: there is no such > waveform.
i am not going to go through the effort of implementing the algorithm that is described in your paper. if i can get a simple script from you that does only what is fully described in your paper (that leaves the SVD out, at least until the next pass) and not including any trade secreted stuff that might be in your pdemo.p file if it worked. if you do that, i will test it with waveforms i can either find or generate and give you (and post) an answer of how it went. if i find a waveform that messes it up, i will post the MATLAB script that made it, or if it is a recorded pitched signal, i'll email it to anyone who asks (in reasonable quantity).
> ANY (benigh) periodic waveform you can come up with will produce > perfect pitch peaks - at 1, 2, 3... pitch periods - an "ideal" pitch > detector indeed.
what if the peak is a little better at 2 or 3 than at one? which one do you pick? always the one with the maximum histogram? if that is the case, i know exactly how to construct a waveform that you think is at period 1 if you listened to it, but will show peak 2 (at twice the lag as peak 1) to be the maximum histogram (or the maxium of any other algorithm that picks the true and smallest period of a perfectly period signal in zero noise). to choose peak 1 over peak 2 when an inuadible but non-zero subharmonic (say one octave below the pitch you think you hear) requires something else that is not in your paper. if you always pick the very best peak in that histogram in your paper, it is susceptable to the same kind of octave error problem other PDAs have (unless there's something else there other than picking the "true" or "mathematically best" period).
> This is as good as it gets - there will never be anything better for > pitch/fundamental period detection.
you should sell cars. you're really good at it.
> And your favorite example of mixing very weak 100 Hz sine wave with > strong 200 Hz sine wave just proves the validity of the method: the > true fundamental frequency is 100 Hz and the method is more sensitive > than human ear, subject only to a graceful degradation of accuracy with > increasing signal-to-noise ratio...
whoa! woo-haa! that'll teach me not to read to the bottom of the post before replying! (actually it hadn't in the past.) so you're saying that if you were listening to a pure sine wave synthesized by the pitch information that comes from your pitch detector as it is tracking some given input like a human voice in speech or singing, and if an inaudible subharmonic (say -80 dB of the main tone) somehow finds its way into your tone, the PDA (and your synthesized sine wave) will immediately jump down an octave and then toggle back up when the subharmonic is gone (possibly multiple times during the tone duration)? if that is the case, i would say that your PDA is not useful for tough pitch detection applications. that behavior represents a failure of correct operation of a PDA. it is an "octave error", a glitch of some manner.
> I hope that I don't sound paternalistic or arrogant.
no, of course not.
> After all, it's > just a very small (yet essential) part of DSP and I DID spent a LOT of > time and effort to work it out in sufficient detail for DSP practitioners.
that you have spent a lot of time working it out (and writing the paper and filing the patent and creating the website and posting the pdemo.p file that i can't get to work in MATLAB) is not the issue. nobody is disputing that. r b-j
gjunge wrote:
> I want to show a Pitch graph/curve of a wave file. I saw that good results > can be made by using AutoCorrelation. I have the autocorrelation function > and I also have the FFT function, but I don't know what I should put into > the autocorrelation function in order to get a pitch line. > I have a wave file (spoken text), and also have the FFT results of the > wave file, how can I extract from this data a nice Pitch curve. The > purpose of the pitch line is to show the intonations of the text.
Google for "phase vocoder" for how to use FFT results of a series of windows of data offset in time; and for "cepstral" or "cepstrum" methods to help with pitch determination of ambiguous sets of frequencies. IMHO. YMMV. -- Ron N. http://www.nicholson.com/rhn
#include "stdafx.h"

#define	W_max		256	// Max. Number of window length
#define PI			3.14159265358979
#define W_length	256		// window length
#define S_length	128		// shift length
#define P		256	       // r(k) order

// Function headers
void CalcAuto(double *pX, double *pRxx);
void FrameProcessing(int index, short *pData, double *pHcoef, double
*pX);
double GetPitch(double *pRxx, int Fs);

// Global variable
FILE *stream;



void main(void)
{
	CWaveSoundRead *pWaveFileReader;

	// =EC=9B=A8=EC=9D=B4=EB=B8=8C =ED=8C=8C=EC=9D=BC =EA=B4=80=EB=A0=A8 =EB=
=B3=80=EC=88=98
	int nFileSize		=3D	0;										// wav file=EC=9D=98 =ED=81=AC=EA=B8=B0
	int SamplingRate	=3D	0;										// wav file=EC=9D=98 sampling rate
	int Resolution		=3D	0;										// wav file=EC=9D=98 resolution
	short *BUFFER		=3D	NULL;									// =EB=B2=84=ED=8D=BC (=EC=9B=A8=EC=9D=B4=
=EB=B8=8C =EB=8D=B0=EC=9D=B4=ED=84=B0
=EC=A0=80=EC=9E=A5=EC=9A=A9)
	pWaveFileReader		=3D	NULL;
	BYTE *pbWavData;												// Pointer to actual wav data
	UINT cbWavSize;													// Size of data
	char fileName[20];												// =EC=9B=A8=EC=9D=B4=EB=B8=8C =ED=8C=8C=EC=
=9D=BC =EC=9D=BD=EA=B8=B0=EC=9A=A9
=ED=8C=8C=EC=9D=BC=EC=9D=B4=EB=A6=84

	// Pitch extraction =EA=B4=80=EB=A0=A8 =EB=B3=80=EC=88=98
	int tframe		=3D	0;											// Number of frames
	int datalength	=3D	0;											// data length.
	double pitch	=3D	0;											// pitch =EA=B3=84=EC=82=B0=EA=B2=B0=EA=B3=
=BC
	double h[W_length];												// Hamming window coef.
	double x[W_length];												// Hamming windowed data
	double Rxx[W_length];											// autocorrelation
	double *Pitch;													// fundamental freq. (pitch)

	sprintf(fileName, "communication.wav");

	// =EA=B3=84=EC=82=B0=EA=B0=92 =ED=99=95=EC=9D=B8=EC=9C=84=ED=95=9C file p=
rocessing
	stream =3D fopen("Result.txt", "w");

	pWaveFileReader=3Dnew CWaveSoundRead();

	// Wav file =EC=97=B4=EA=B8=B0.
	if(FAILED( pWaveFileReader->Open((LPSTR)(LPCTSTR)fileName) ))
	{
		cout<<"Failed to Open the wave file"<<endl;
		exit(0);
	}
	nFileSize	=3D	pWaveFileReader->m_ckIn.cksize;					// =ED=8C=8C=EC=9D=BC =
=ED=81=AC=EA=B8=B0
	pbWavData	=3D	new BYTE[nFileSize];							// BYTE: 8 bits
	BUFFER		=3D	new short[nFileSize/2];							// =EB=B2=84=ED=8D=BC

	// =EC=9B=A8=EC=9D=B4=EB=B8=8C =ED=8C=8C=EC=9D=BC =EC=9D=BD=EA=B8=B0
	if(FAILED( pWaveFileReader->Read(nFileSize, pbWavData, &cbWavSize) ))
	{
		cout<<"Failed to Read the wave file"<<endl;
		exit(0);
	}
	pWaveFileReader->Reset();										// file=EC=9D=84 =EC=B4=88=EA=B8=B0 =EC=
=9C=84=EC=B9=98=EB=A1=9C
=EC=B4=88=EA=B8=B0=ED=99=94.
	SamplingRate	=3D	pWaveFileReader->m_pwfx->nSamplesPerSec;	// wav file=EC=
=9D=98
sampling rate.
	Resolution		=3D	pWaveFileReader->m_pwfx->wBitsPerSample;	// wav file=EC=9D=
=98
resolution.

	// =EB=B2=84=ED=8D=BC=EC=97=90 =EC=A0=80=EC=9E=A5
	memcpy(BUFFER, pbWavData, nFileSize);

	// wav file =EC=A0=95=EB=B3=B4 =EC=B6=9C=EB=A0=A5
	fprintf(stream, "Sampling rate: %d \n", SamplingRate							);
	fprintf(stream, "Resoultion   : %d \n", Resolution								);
	fprintf(stream, "File size    : %d \n", nFileSize								);
	fprintf(stream, "Size of data : %d \n", cbWavSize								);
	fprintf(stream,
"---------------------------------------------------------	\n"	);


	// frame =EA=B0=AF=EC=88=98 =EA=B3=84=EC=82=B0
	datalength =3D nFileSize/2;										// wav file=EC=9D=98 =EB=8D=B0=EC=9D=
=B4=ED=84=B0 =EA=B0=AF=EC=88=98
	tframe =3D (int)((datalength-W_length)/S_length) + 1;

	// =EA=B0=81 frame=EC=9D=98 pitch=EB=A5=BC =EC=A0=80=EC=9E=A5=ED=95=98=EA=
=B8=B0=EC=9C=84=ED=95=9C =EB=B3=80=EC=88=98
	Pitch =3D new double[tframe];




	for (int i=3D0; i<W_length; i++)
	{
		h[i] =3D 0.54 - 0.46*cos(2*PI*i/(double)(W_length-1));
		fprintf(stream, "h[%d]: %f \n", i, h[i]);
	}



	// =EB=A7=A4 frame =EB=A7=88=EB=8B=A4 pitch =EA=B2=80=EC=B6=9C=ED=95=A8.
	for (int frame=3D0; frame<tframe; frame++)
	{
		// Frame processing
		FrameProcessing(frame, BUFFER, h, x);
		// Autocorrelation analysis
		CalcAuto(x, Rxx);


		// Pitch extraction
		Pitch[frame] =3D GetPitch(Rxx, SamplingRate);



	}

	delete pbWavData;
	delete pWaveFileReader;
	pWaveFileReader->Close();
	pWaveFileReader=3DNULL;

	delete [] Pitch;



	fclose(stream);
}


/*--------------------------------------------------------------------
double GetPitch(double *pRxx,	// autocorrelation
			    int	   Fs)		// sampling rate
--------------------------------------------------------------------*/
double GetPitch(double *pRxx,int Fs)
{


	double s1			=3D 0;										// slope 1
	double s2			=3D 0;										// slope 2
	double peakValue	=3D 0;										// peak=EC=A0=90=EC=97=90=EC=84=9C=EC=9D=
=98 r(k) =EA=B0=92.
	double peakIndex	=3D 0;										// peak index.
	double energy		=3D 0;										// short-time energy (r(0))
	double f0_temp		=3D 0;										// temporary f0
	double f0			=3D 0;										// fundamental freq. (pitch)


	energy =3D 10 * log10(pRxx[0]);

	// speech =EA=B5=AC=EA=B0=84=EC=9D=BC =EA=B2=BD=EC=9A=B0.
	if (energy > 40)
	{
		// pre-emphasis factor =EA=B3=84=EC=82=B0. voiced/unvoiced =EA=B5=AC=EB=
=B6=84
		double mu =3D pRxx[1]/pRxx[0];

		// voiced=EC=9D=BC =EA=B2=BD=EC=9A=B0.
		if (mu > 0.5)
		{
			// peak-picking
			for (int k=3D0; k<W_length-1; k++)
			{
				s1 =3D pRxx[k+1] - pRxx[k];
				s2 =3D pRxx[k+2] - pRxx[k+1];

				// =EA=B8=B0=EC=9A=B8=EA=B8=B0=EB=A5=BC =EC=9D=B4=EC=9A=A9 =EC=B5=9C=EB=
=8C=80 peak =EB=B0=8F peak index =EA=B2=80=EC=B6=9C.
				if (s1>0 && s2<0)
				{
					if (pRxx[k+1] > peakValue)
					{
						peakValue =3D pRxx[k+1];
						peakIndex =3D k+1;
					}
				}
			}

			// pitch =EA=B3=84=EC=82=B0
			if (peakIndex-1 > 0)
			{
				f0_temp =3D Fs/(peakIndex-1);
			}
			// =EB=82=98=EB=88=97=EC=85=88=EC=9D=B4 0=EC=9D=B4 =EB=90=98=EB=8A=94 =
=EC=A1=B0=EA=B1=B4.
			else if (peakIndex-1 =3D=3D 0)
			{
				f0_temp =3D 0;
			}

			// pitch =EC=A1=B4=EC=9E=AC=EB=B2=94=EC=9C=84=EC=97=90 =EC=9E=88=EC=9D=
=84 =EA=B2=BD=EC=9A=B0=EC=97=90=EB=A7=8C f0 =EC=84=A4=EC=A0=95 (40~400 Hz)
			if (f0_temp < 40)
			{
				f0 =3D 0;
			}
			else if (f0_temp > 400)
			{
				f0 =3D 0;
			}
			else
			{
				f0 =3D f0_temp;
			}
		}
		// unvoiced=EC=9D=BC =EA=B2=BD=EC=9A=B0.
		else
		{
			f0 =3D 0;
		}
	}
	// silence =EA=B5=AC=EA=B0=84=EC=9D=BC =EA=B2=BD=EC=9A=B0.
	else
	{
		f0 =3D 0;
	}

	return f0;

}

/*--------------------------------------------------------------------
void CalcAuto(double *pX,		// windowed data
			  double *pRxx)		// autocorrelation
--------------------------------------------------------------------*/
void CalcAuto(double *pX, double *pRxx)
{
	int k;		// r(k)=EC=9D=98 index.
	int n;
	for (k=3D0; k<P; k++)
	{
		pRxx[k] =3D 0.0;
		for (n=3D0; n<W_length-k; n++)
		{
			pRxx[k] =3D pRxx[k] + pX[n]*pX[n+k];
		}
		pRxx[k] =3D pRxx[k]/W_length;
	}

}

/*--------------------------------------------------------------------
void FrameProcessing(int	index,		// Frame index
					 short	*pData,		// Wav data
					 double *pHcoef,	// Hamming window coef.
					 double *pX)		// Hamming windowed data
--------------------------------------------------------------------*/
void FrameProcessing(int index, short *pData, double *pHcoef, double
*pX)
{
	// Hamming Windowing & rectangular windowing
	/*
	pX[0] =3D pHcoef[0] * (double)pData[index*S_length];
	for (int i=3D1; i<W_length; i++)
	{
		pX[i] =3D pHcoef[i] * (double)pData[index*S_length + i];
	}
	*/

	// Rectangular windowing
	pX[0] =3D (double)pData[index*S_length];
	for (int i=3D1; i<W_length; i++)
	{
		pX[i] =3D (double)pData[index*S_length + i];
	}
}



gjunge wrote:
> Hi, > > I want to show a Pitch graph/curve of a wave file. I saw that good results > can be made by using AutoCorrelation. I have the autocorrelation function > and I also have the FFT function, but I don't know what I should put into > the autocorrelation function in order to get a pitch line. > I have a wave file (spoken text), and also have the FFT results of the > wave file, how can I extract from this data a nice Pitch curve. The > purpose of the pitch line is to show the intonations of the text. > > If anybody could explain to me the steps to be taken in plain english, > pseudo code, or C#, VB(.NET) or Java, I would be very very thankfull. >=20 > Many Thanks,=20 > Gidon
OK, Robert, just for you... I am posting it a second time.
This is cut-and-paste Matlab routine - just fix the wrapped around
comment lines.

Add some signal pre-processing (like low-pass filtering) as you wish...

You are still confused, my friend, about pitch vs. fundamental
frequency (or, rather, fundamental period)
Pitch is an auditory concept, it's tied to human perception.
PDA is a mathematical method measuring fundamental period (for
correlation, AMDF, ASDF, or my method) or fundamental frequency (for
Fourier-based methods) of a periodic signal.
(After "true" f0 is measured an adjustment can be made to accomodate
for limitations of a human auditory perception)

It is that simple: perception vs. strict math

And, BTW, I am not trying to sell you anything: I am a scientist and
inventor, not a salesman, for your information

Regards


**************************************************************************

This is a prototype software implementation of the invention.
It illustrates basic principles and operation of the new methods.
The m-files below (ptrack.m, pframe.m, embed.m, xdist.m, nhistmax.m,
histsmooth.m, localpeaks.m and, optionally, svdembed.m) should be
copied
to Matlab work directory.


Copyright (C) 2001 Dmitry E. Terez


DISCLAIMER: This software is provided WITHOUT ANY WARRANTY, without
even
implied statement of merchantability or fitness for a particular
purpose.


******************************** ptrack.m
********************************
function ptr =3D ptrack(x)
%PTRACK Compute pitch period track.
%   This MATLAB function computes pitch period track and performs
%   voiced/unvoiced segmentation of a speech signal.
%
%   x   - input speech signal vector (MATLAB double-precision array).
%   ptr - output vector of pitch period estimates for all fixed-size
frames.
%   Each frame is assigned either a pitch period value in integer
samples
%   (for frames determined to be voiced) or zero (for unvoiced
frames).
%   To obtain fundamental frequency f0 values from pitch period
values,
%   one needs to compute Fs/ptr for each of the voiced frames, where
%   Fs is a sampling frequency in Hz and ptr is pitch period in
samples.
%   (For unvoiced frames f0 is assigned zero value).
%   Further smoothing of the raw output pitch tracks may be desirable.
%
%   This program was tested under MATLAB 5.3.1 on Windows PC platform.
%   No additional toolboxes are required.
%   Parameters are approximately chosen to produce correct results in
%   most cases on clean female speech and male speech with f0 > 100Hz
%   from TIMIT database (16 kHz sampling, 16 bits per sample).
%   For other sampling rates and speech types (e.g. male speech with
f0<100Hz)
%   some parameters (e.g. frame size) will need to be modified.


% Frame parameters:
nwin  =3D 200;       % frame size in samples
nstep =3D 100;       % frame step in samples


% Pitch search range is 100 Hz - 400 Hz (at 16 kHz sampling rate):
low  =3D 40;  % smallest pitch period to search for (in samples)
high =3D 160; % largest pitch period to search for (in samples)


% Parameters for time-delay embedding:
ndim =3D 3;     % embedding dimension
ndel =3D 10;    % time delay in samples


% Parameters for pitch tracking and voicing decision:


% emin is a minimum RMS energy of voiced frame.
% This value may require adjustment.
% The present value is approximately chosen for clean speech
% signals with an amplitude normalized to be in [-1 +1] range
emin  =3D 0.015;


pdiff =3D 6;  % max. allowed pitch period difference in samples
            % between adjacent voiced frames
pdiff2 =3D 2.0*pdiff;


nback =3D 5;  % max. number of frames to track backward


ne =3D nwin - (ndim-1)*ndel; % Number of vectors in state space
                           % after embedding a speech frame


% Global variable KK is an array with 2 columns storing all
% possible non-repeating combinations of 2 integers
% with their absolute difference between low and high.
% It is computed only once using Matlab 5 function nchoosek
% and subsequently reused for each frame computation.
np =3D 1:ne;
nn =3D nchoosek(np,2);


dt =3D nn(:,2)-nn(:,1);
isel =3D find(dt > low & dt < high);


global KK
KK =3D nn(isel,:);


% Determine input length and required number of frames
nx=3Dlength(x);
nframes=3Dfloor(1+(nx-nwin)/nstep);


x=3Dx(:);  % Make input signal vector a column


% Normalize input signal amplitude to be in [-1 1] range
maxval=3Dmax(abs(x));
x=3Dx/maxval;


% Initialize arrays
ptr  =3D zeros(1,nframes); % Output vector to store final pitch values


pcan =3D cell(1,nframes);  % Cell array to store pitch candidates for
all frames


rel  =3D zeros(1,nframes); % Vector to store reliability measures for
all frames


en   =3D zeros(1,nframes); % Vector to store RMS energy for all frames


for i=3D1:nframes % The main loop going through all frames


 nstart=3D(i-1)*nstep+1;
 nend=3Dnstart+nwin-1;
 xf =3D x(nstart:nend);   % current effective frame


 % Process current effective frame
 [pe, rm, nsel, rsel, xen] =3D pframe(xf, ndim, ndel, low, high);


 pcan{i} =3D pe;     % array of pitch candidates
 rel(i)  =3D rm;     % reliability measure
 en(i)   =3D xen;    % RMS frame energy


 % Track pitch delaying final decisions by one or more frames.
 if(i > 2) % At least previous frame and next frame are needed


  jp =3D i-2; % Previous frame
  jc =3D i-1; % Current frame (delayed by one frame)
  jn =3D i;   % Next frame (same as current effective frame)


  if (en(jc) < emin) % If energy is too low assign unvoiced value
   ptr(jc) =3D 0;
  else   % Energy is sufficient
   if (rel(jc) =3D=3D 1)  % current frame is reliable
       if (ptr(jp) > 0)  % previous frame was found voiced
         if(abs(pcan{jc}(1)-ptr(jp))<pdiff) %cur. consistent with
prev.
            ptr(jc) =3D pcan{jc}(1); % assign pitch (lowest multiple)
         % Otherwise check next frame to see if it's reliable and
         % consistent with previous frame (current frame fall-out)
         elseif(rel(jn)=3D=3D1 & abs(pcan{jn}(1)-ptr(jp)) < pdiff2)
            ptr(jc)=3Dround(0.5*(ptr(jp)+pcan{jn}(1)));
         % Check if next frame is reliable and consistent with current
         elseif (rel(jn)=3D=3D1 & abs(pcan{jn}(1)-pcan{jc}(1)) < pdiff)
            ptr(jc)=3Dpcan{jc}(1); %can be real pitch doubling or
halfing
         else % If all checks fail assign unvoiced value to current
            ptr(jc) =3D 0;
         end
       else % previous frame was found unvoiced
         % Check for a start of voiced segment: need two consecutive,
         % reliable and consistent with each other frames
         if(rel(jn) =3D=3D 1 & abs(pcan{jn}(1)-pcan{jc}(1)) < pdiff)
            ptr(jc) =3D pcan{jc}(1);
            % If start of voiced segment is determined, try to track
            % pitch period backward for up to nback frames
            jj =3D jc-1;
            found =3D 1;
            nf =3D 0;
            while ( jj > 0 & ptr(jj) =3D=3D 0 & en(jj) > emin ...
                  & found =3D=3D 1 & nf < nback)
              parr =3D pcan{jj}; % vector of pitch candidates
              [mindiff,k] =3D min(abs(parr - ptr(jj+1)));
              if (mindiff < pdiff) % If found a match
                ptr(jj) =3D parr(k); % then assign the value found
              else % Stop backward tracking
                found =3D 0;
              end
              jj =3D jj - 1;
              nf =3D nf + 1;
            end
         else % Otherwise assign unvoiced value for now
            ptr(jc) =3D 0;
         end
       end
   else % current frame is unreliable
       if (ptr(jp)>0) % previous frame was found voiced
         % Search current frame for best match to previous
         parr =3D pcan{jc}; % vector of current pitch candidates
         [mindiff, k] =3D min(abs(parr - ptr(jp)));
         if (mindiff < pdiff)  % If found a good match
           ptr(jc) =3D parr(k); % then assign the value found
         else % Otherwise check if next frame
           if(en(jn)>emin & rel(jn)=3D=3D1 & ... % is reliable
             abs(pcan{jn}(1)-ptr(jp))<pdiff2) % and matches prev.
             ptr(jc)=3Dround(0.5*(ptr(jp)+pcan{jn}(1)));
           else % If all checks fail assign unvoiced value
             ptr(jc) =3D 0;
           end
         end
       else % previous frame was found unvoiced
         ptr(jc) =3D 0; % assign unvoiced value to current frame
       end
   end  % current frame reliable or unreliable


  end  % if energy < emin
 end % if i > 2
end % Main for loop
**************************************************************************



***************************** pframe.m
***********************************
function [pcan,rel,nsel,rsel,xen] =3D pframe(xf,ndim,ndel,low,high)
%PFRAME Determine pitch candidates for a speech frame.
%   This function determines pitch period candidates for
%   a signal frame of some predetermined number of samples.
%
%   INPUT:
%   xf   - input signal vector (speech frame)
%   ndim - embedding dimension
%   ndel - delay value for time-delay embedding (in integer samples)
%   low  - lower bound for pitch search in integer samples
%   high - upper bound for pitch search in integer samples
%
%   OUTPUT:
%   pcan - vector of pitch period candidates (in integer samples),
%          ordered by value from lowest to highest (if rel=3D=3D1),
%          or by corresponding peak magnitude from highest peak
%          to lowest (if rel=3D=3D0).
%   rel  - frame reliability measure:
%          rel=3D1 if the number of pitch candidates <=3D 3
%          and they are integer multiples (with some tolerance),
%          or there is only one pitch candidate;
%          rel=3D0 otherwise (or if no candidates are found).
%   nsel - number of distances selected for a histogram
%   rsel - neigborhood radius
%   xen  - RMS frame energy


% Parameters (in relative units - frame size independent):
rmax    =3D 0.15;  % maximal neigborhood radius in state space
hmax    =3D 0.9;   % high reference peak magnitude
href    =3D 0.8;   % reference peak magnitude
hmin    =3D 0.7;   % low reference peak magnitude
hrelmin =3D 0.6;   % minimal magnitude of highest peak to have rel=3D1
mtol    =3D 0.05;  % relative tolerance for testing pitch multiples
thld    =3D 0.4;   % threshold level


minsel  =3D 2;   % min. number of distances as fraction of nwin
maxpeaks =3D 6;  % max. number of pitch candidates to retain
smooth =3D 1;    % perform histogram smoothing before peak searching
minsep =3D 8;    % min. separation allowed between peaks (in samples)


nx =3D length(xf);     % number of samples in input frame
nselmin =3D minsel*nx; % min. number of distances to select for
histogram


% Embed signal using time-delay embedding
% (Normalization is also performed here in order for all
% output vectors to fit into a unit cube in state space)
xe =3D embed(xf, ndim, ndel);


% To use alternative SVD-embedding uncomment next 2 lines
%nsvd =3D 20;  % window size for SVD-embedding (can be adjusted)
%xe =3D svdembed(xf, ndim, nsvd);


% nxe is the number of vectors in state space
[nxe, junk] =3D size(xe);


maxdt =3D nxe - 1;  % maximal time separation (in samples) between
                  % vectors in state space (histogram length)


% Find and compute all spatial distances less than rmax
% and corresponding temporal separations
xd =3D xdist(xe, rmax);


% ntotal is the total number of distances less than rmax
[ntotal, junk] =3D size(xd);


dt =3D xd(:,2)'; % Extract vector of temporal separations
               % (sorted by corresponding spatial distances
               % from smallest to largest)


% First, select all found distances closer than rmax,
% compute normalized histogram and find highest peak m0
nsel =3D ntotal;
dtsel =3D dt(1:nsel);
[nhist, m0] =3D nhistmax(dtsel, maxdt, low, high);


% Try to adjust neigborhood radius in order to bring
% the highest peak to some desirable range, if possible
if ( m0 > hmax & ntotal > nselmin )
   nsel =3D nselmin;
   dtsel =3D dt(1:nsel);
   [nhist, m0] =3D nhistmax(dtsel, maxdt, low, high);
   if ( m0 < hmin )
      nsel =3D min(floor(nsel*href/m0),ntotal);
      dtsel =3D dt(1:nsel);
     [nhist, m0] =3D nhistmax(dtsel, maxdt, low, high);
   end
end


hist =3D nhist-thld*m0;          % Threshold histogram
hist1=3Dhist.*(sign(hist)+1)./2; % and half-way rectify


% (Optionally) smooth histogram before searching for peaks
if (smooth =3D=3D 1)
  hist1 =3D histsmooth(hist1);
end


% Find all local peaks between low and high bounds
% (with optional constraint on their proximity to each other)
peaks =3D localpeaks(hist1,low,high,minsep);


[npeaks, junk] =3D size(peaks); % total number of peaks found


% Analyze number and positions of found peaks
if (npeaks =3D=3D 0) % If no peaks are found


  pcan =3D 0;  % assign unvoiced value
  rel =3D 0;   % frame is unreliable


elseif (npeaks > 0 & npeaks <=3D 3) % 1, 2 or 3 peaks are found


  pi =3D sort(peaks(:,2));  % Sort peaks by position


  % Check for multiples (with some relative tolerance mtol)
  if (npeaks =3D=3D 1)
    mult =3D 1;
  elseif (npeaks =3D=3D 2 & abs(pi(2)-2.0*pi(1))/pi(1) < mtol)
    mult =3D 1;
  elseif (npeaks =3D=3D 3 & abs(pi(2)-2.0*pi(1))/pi(1) < mtol &...
                        abs(pi(3)-3.0*pi(1))/pi(1) < mtol)
    mult =3D 1;
  else
    mult =3D 0;
  end


  if (mult =3D=3D 1 & m0 > hrelmin) % if multiples and mo>hrelmin
    rel =3D 1; % proclaim current frame to be reliable
  else
    rel =3D 0; % otherwise frame is unreliable
  end
  pcan =3D pi; % pitch period candidates


else % more than 3 peaks found
  np =3D min(npeaks, maxpeaks);
  pcan =3D peaks(1:np,2); % retain np highest peaks as pitch cand.
  rel =3D 0;  % frame is unreliable
end % npeaks


rsel =3D xd(nsel,1);           % chosen neighborhood radius
xen =3D sqrt(sum(xf.*xf)/nx);  % RMS frame energy
**************************************************************************



****************************** embed.m
***********************************
function xe =3D embed(x, ndim, ndel)
%EMBED Embed a signal using time-delay embedding.
%   x    - input signal vector
%   ndim - embedding dimension
%   ndel - time delay in samples
%   xe   - output matrix with ndim columns and nr rows
%          (Each column represents a separate dimension and
%           each row represents a vector in state space)


nx =3D length(x); % input length


% Normalize input vector x to be in [0 1] range in order
% for a reconstructed trajectory to fit into a unit cube
% (Alternatively, this can be done to each column of xe)
xmax =3D max(x);
xmin =3D min(x);
xrange =3D xmax-xmin;
if (xrange > 0)
   x =3D (x - xmin)/xrange;
else
   x =3D zeros(size(x));
end


% Calculate the number of rows in the output array xe
nr =3D nx - (ndim-1)*ndel; % Number of vectors in state space
xe=3Dzeros(nr,ndim);       % Preallocate space for output array


% Embed using method of delays
for i =3D 1:ndim
   xe(:,i) =3D x(1+ndel*(i-1) : nr+ndel*(i-1));
end
**************************************************************************



****************************** xdist.m
***********************************
function xd =3D xdist(xe, rmax)
%XDIST Select and compute distances in state space and time
separations.
%   This function finds all distances less than rmax between pairs of
%   vectors in state space and computes corresponding time
separations.
%   INPUT:
%   xe   - input array with embedded signal (vectors in state space)
%   rmax - chosen neighborhood radius in state space
%   OUTPUT:
%   xd   - 2-column array of computed spatio-temporal distances:
%          Col. 1 has Euclidean distances less than rmax;
%          Col. 2 has corresponding time separations (in samples).
%   Array xd is sorted in ascending order on the 1st column.
%
%   (This function can be replaced with any suitable fast
%    neighbor-searching procedure to speed-up computations.)


global KK  % Global array KK of possible index combinations is reused


rmax2 =3D rmax*rmax; % Square of max. Euclidean distance


[mx,ndim] =3D size(xe); % mx   - number of vectors in state space
                      % ndim - embedding dimension


[mk,junk] =3D size(KK); % mk   - number of vector pairs


% Initialize a big array of spatio-temporal distances
% and fill 2nd column with temporal separations
xd =3D [zeros(mk,1) (KK(:,2)-KK(:,1))];


% Compute all squared Euclidean distances and fill 1st column
% (Distance measures other than Euclidean are also possible)
for i=3D1:ndim
   xd(:,1)=3Dxd(:,1) + (xe(KK(:,2),i)-xe(KK(:,1),i)).^2;
end


% Select only squared Euclidean distances less than rmax2
isel =3D find( xd(:,1) < rmax2 );
xd =3D xd(isel,:);


% Sort in ascending order based on distances in the first column
xd =3D sortrows(xd);


% For computational efficiency squared values of Euclidean distances
% should be used instead of actual distances:
% square root operation is not needed and is retained here
% only for illustrative purposes.
xd(:,1) =3D sqrt(xd(:,1));
**************************************************************************



***************************** nhistmax.m
*********************************
function [nhist, m0] =3D nhistmax(x, nbins, low, high)
%NHISTMAX Compute normalized histogram.
%   This function computes normalized unbiased histogram
%   of x distribution and finds maximum peak magnitude.


% Histogram of x distribution among nbins
hx =3D hist(x,1:nbins);


% Compute normalized unbiased histogram
unbias =3D nbins:-1:1;
nhist  =3D hx./unbias;


% Find maximal value between low and high bounds
m0 =3D max(nhist(low:high));
**************************************************************************



*************************** histsmooth.m
*********************************
function xs =3D histsmooth(x)
%HISTSMOOTH Smooth a histogram.
%   Simple 3-point moving-average smoothing:
%   xs(i) =3D a1*x(i-1) + a0*x(i) + a1*x(i+1)
%
%   (This function can be replaced with any suitable
%   histogram smoothing procedure)


 a0 =3D 0.5;
 a1 =3D 0.25;


 nx=3Dlength(x);


 xl =3D [x(2:nx) x(nx)];
 xr =3D [x(1) x(1:(nx-1))];


 xs =3Da0*x + a1*(xl+xr);
**************************************************************************



************************* localpeaks.m
***********************************
function peaks =3D localpeaks(hx, low, high, minsep)
%LOCALPEAKS Find positions and magnitudes of local peaks.
%   This function searches for all local maxima in
%   a histogram between low and high search bounds.
%   INPUT:
%   hx     - input histogram vector
%   low    - lower bound to search for peaks
%   high   - upper bound to search for peaks
%   minsep - minimal allowed separation between adjacent peaks
%
%   OUTPUT:
%   peaks =3D [pv pi] - Output array with 2 columns:
%           pv has peak magnitudes from highest to lowest,
%           pi has corresponding peak positions (in samples).


nx=3Dlength(hx); % total number of bins in a histogram


d=3Dsign(diff(hx));


peaks =3D [];
flag =3D 0;


% Find all local maxima between low and high bounds
for i=3Dlow:high-1
   if (d(i) =3D=3D 1.0)
      flag =3D 1;
      nstart =3D i;
   elseif (d(i) =3D=3D -1.0)
      if (flag =3D=3D 1)
         flag =3D 0;
         nend =3D i;
         pi =3D i - floor((nend-nstart)/2.0);
         pv =3D hx(pi);
         peaks =3D [peaks; pv pi];
      end
   end
end


[npeaks, junk] =3D size(peaks); % total number of peaks found


if (npeaks > 0)
 peaks=3Dflipud(sortrows(peaks)); % Sort based on peak heights


 % Optionally scan all peaks from highest to lowest and retain only
 % those separated by at least minsep samples from already selected.
 if (minsep > 2)
   seppeaks =3D [peaks(1,:)];  % select highest peak first
   for i=3D2:npeaks            % analyze remaining peaks
     [np, junk] =3D size(seppeaks);
     count=3D0;
     for j=3D1:np
       if(abs(peaks(i,2)-seppeaks(j,2)) < minsep)
          count =3D count+1;
       end
     end
     if (count =3D=3D 0)
        seppeaks =3D [seppeaks; peaks(i,:)];
     end
   end
   peaks =3D seppeaks;
 end


end
**************************************************************************



************************** svdembed.m
************************************
function xe =3D svdembed(x,ndim,nsvd)
%SVDEMBED Embed a signal using SVD-embedding procedure.
%   This is an alternative to time-delay embedding.
%
%   x    - input signal vector
%   ndim - embedding dimension
%   nsvd - SVD window size in samples
%   xe   - output matrix with ndim columns and nr rows
%          (Each column represents a separate dimension and
%           each row represents a vector in state space)


nx =3D length(x);
av=3Dmean(x);
x=3Dx-av;


nr =3D nx - (nsvd-1);
xs=3Dzeros(nr,nsvd);


for i =3D 1:nsvd
   xs(:,i) =3D x(i:(nr+i-1));
end


[U,S,V] =3D svd(xs,0);
Vd =3D V(:,1:ndim);
xe =3D xs*Vd;


[mr, junk] =3D size(xe);


% Normalize each column so that each dimension is in [0 1] range
for i =3D 1:ndim
  xmax =3D max(xe(:,i));
  xmin =3D min(xe(:,i));
  xrange =3D xmax-xmin;
  if (xrange > 0)
   xe(:,i) =3D (xe(:,i) - xmin)/xrange;
  else
   xe(:,i) =3D zeros(mr,1);
  end
end
**************************************************************************




robert bristow-johnson wrote:
> fizteh89 wrote: > > Robert, not to offend you or anybody else in this group: by "they" I > > certainly didn't mean some regulars here or some mom-and-pop workshops > > out there. > > "They" know who they are and what they are doing all too well... > > impressive. > > > I just downloaded and unzipped the code and it works fine for me, aside > > from giving some warning messages. You need Windows version of Matlab > > i wrote: > > > i downloaded pdemo and unzipped to a PC with MATLAB, changed the > > > directory, and typed in pdemo at the prompt. ... > > > though and mine is pretty old, but it should be backward compatible. > > and MATLAB preaches strict backward compatibility. why does my windows > version of MATLAB say the following?: > > > > =BB pdemo > > > ??? c:\program files\matlab\pdemo\pdemo.p is not a MATLAB P-file. > > i'm not averse to fixing something in my MATLAB to get it to work, but > i have no idea why it would do that. i download, unzip, and run it > just as you web page says, and i can't get it to work. > > > I am sure other people out there were able to run the demo. "They" also > > know how to search IFW of a pending US patent application to get all > > the public info, including the source code submitted with provisional > > application... > > BTW, I posted some Matlab source here, in comp.dsp, some time ago. > > i started at > http://groups.google.com/groups/profile?enc_user=3DriwiGBQAAAC5hnaoSNtzWu=
2Wz9LFDnOUOPANdqfI6prRsqjc7uCt1A
> > and couldn't find it. i found some interesting quotes: > > Re: 23k$US for Matlab Embedded DSP development? > fizteh89 wrote: > > The world is not flat as far as local girls and local restaurants are > > concerned. > > > > But, of course, some people prefer cyberfood or cybersex... > > > > As for your "unanswered questions"...,well, you promised to come up > > with some waveform to fool my PDA, but the truth is: there is no such > > waveform. > > i am not going to go through the effort of implementing the algorithm > that is described in your paper. if i can get a simple script from you > that does only what is fully described in your paper (that leaves the > SVD out, at least until the next pass) and not including any trade > secreted stuff that might be in your pdemo.p file if it worked. > > if you do that, i will test it with waveforms i can either find or > generate and give you (and post) an answer of how it went. if i find > a waveform that messes it up, i will post the MATLAB script that made > it, or if it is a recorded pitched signal, i'll email it to anyone who > asks (in reasonable quantity). > > > ANY (benigh) periodic waveform you can come up with will produce > > perfect pitch peaks - at 1, 2, 3... pitch periods - an "ideal" pitch > > detector indeed. > > what if the peak is a little better at 2 or 3 than at one? which one > do you pick? always the one with the maximum histogram? if that is > the case, i know exactly how to construct a waveform that you think is > at period 1 if you listened to it, but will show peak 2 (at twice the > lag as peak 1) to be the maximum histogram (or the maxium of any other > algorithm that picks the true and smallest period of a perfectly period > signal in zero noise). to choose peak 1 over peak 2 when an inuadible > but non-zero subharmonic (say one octave below the pitch you think you > hear) requires something else that is not in your paper. if you always > pick the very best peak in that histogram in your paper, it is > susceptable to the same kind of octave error problem other PDAs have > (unless there's something else there other than picking the "true" or > "mathematically best" period). > > > This is as good as it gets - there will never be anything better for > > pitch/fundamental period detection. > > you should sell cars. you're really good at it. > > > And your favorite example of mixing very weak 100 Hz sine wave with > > strong 200 Hz sine wave just proves the validity of the method: the > > true fundamental frequency is 100 Hz and the method is more sensitive > > than human ear, subject only to a graceful degradation of accuracy with > > increasing signal-to-noise ratio... > > whoa! woo-haa! that'll teach me not to read to the bottom of the post > before replying! (actually it hadn't in the past.) > > so you're saying that if you were listening to a pure sine wave > synthesized by the pitch information that comes from your pitch > detector as it is tracking some given input like a human voice in > speech or singing, and if an inaudible subharmonic (say -80 dB of the > main tone) somehow finds its way into your tone, the PDA (and your > synthesized sine wave) will immediately jump down an octave and then > toggle back up when the subharmonic is gone (possibly multiple times > during the tone duration)? if that is the case, i would say that your > PDA is not useful for tough pitch detection applications. that > behavior represents a failure of correct operation of a PDA. it is an > "octave error", a glitch of some manner. > > > I hope that I don't sound paternalistic or arrogant. > > no, of course not. > > > After all, it's > > just a very small (yet essential) part of DSP and I DID spent a LOT of > > time and effort to work it out in sufficient detail for DSP practitione=
rs.
> > that you have spent a lot of time working it out (and writing the paper > and filing the patent and creating the website and posting the pdemo.p > file that i can't get to work in MATLAB) is not the issue. nobody is > disputing that. >=20 > r b-j
fizteh89 wrote:
> OK, Robert, just for you... I am posting it a second time.
i missed it the first time. and i couldn't find it with Google groups, but i didn't go through each of your posts.
> This is cut-and-paste Matlab routine - just fix the wrapped around > comment lines.
i can do that. (and i will within a week i think.)
> Add some signal pre-processing (like low-pass filtering) as you wish...
i can do that, too. most likely i will couple your PDA output to an oscillator and synthesize the note that is tracking the input. maybe your code does it, i haven't examined it yet.
> You are still confused, my friend, about pitch vs. fundamental > frequency (or, rather, fundamental period)
i think i know the meaning of all three.
> Pitch is an auditory concept, it's tied to human perception.
no argument from me about that.
> PDA is a mathematical method measuring fundamental period (for > correlation, AMDF, ASDF, or my method) or fundamental frequency (for > Fourier-based methods) of a periodic signal.
i would define "PDA" to be closer to what the acronym stands for, a "Pitch Detection Algorithm". it is not soley a f0 estimator although there is a very strong relationship between f0 and pitch. the problem is that with many instruments and the human voice, both spoken and singing, sometimes the nature of the vibration of whatever it is (reed, lips, strings, vocal cords, or a column of air) is that one vibration is nearly identical to the next but slightly different but the following vibration is very much like the first. every odd cycle is more identical to each other than they are to the even cycles. so the f0 is truly one octave lower, but the human listening may not perceive it to be that lower octave if the even cycles are soooo similar to the odd cycles that the difference is hard to audibly measure (but a machine could, like a PDA). anyway, any application i can think of for a PDA desires the pitch returned to be the same as the perceived pitch. if the note sounds like middle C, the PDA should say it's middle C, not the C an octave below nor the F that is 7 semitones lower than that. if the PDA has trouble making up its mind about whether it is the low C or middle C, that will sound like crap when the PDA output is controlling an NCO (numerically controlled oscillator) because the NCO will jump back and forth between the different notes spuriously.
> (After "true" f0 is measured an adjustment can be made to accomodate > for limitations of a human auditory perception) > > It is that simple: perception vs. strict math > > And, BTW, I am not trying to sell you anything: I am a scientist and > inventor, not a salesman, for your information
what does the following sound like?:
>>> This is as good as it gets - there will never be anything better for >>> pitch/fundamental period detection.
... a scientist or a salesman? r b-j