Forums

TIMIT database and pitch detection

Started by Ross Clement (Email address invalid - do not use) January 6, 2006
/* cross-posted to both comp.dsp and comp.speech.research */

Hi everyone.

I note that a lot of papers on pitch detection use the TIMIT database
for experiments. However as far as I can see from readme files, the
database does not include verified pitch information for the sentences.
How then do people measure the accuracy of their pitch-detection
methods. I haven't yet read in detail every one of the massive pile of
pitch detection papers I have printed out, but I keep on coming across
TIMIT, but when looking through the online documentation for TIMIT, I
can't see references to files containing the pitch information.

I note that MOCHA-TIMIT has a Laryngograph file which will allow more
accurate estimation of pitch than would be expected from the recording
of the voice. Does TIMIT include this information too. If not, what
methods are people using to estimate the accuracy of pitch using this
data.

Cheers,

Ross-c

"Ross Clement (Email address invalid - do not use)" <clemenr@wmin.ac.uk> writes:

> /* cross-posted to both comp.dsp and comp.speech.research */ > > Hi everyone. > > I note that a lot of papers on pitch detection use the TIMIT database > for experiments. However as far as I can see from readme files, the > database does not include verified pitch information for the sentences. > How then do people measure the accuracy of their pitch-detection > methods. I haven't yet read in detail every one of the massive pile of > pitch detection papers I have printed out, but I keep on coming across > TIMIT, but when looking through the online documentation for TIMIT, I > can't see references to files containing the pitch information. > > I note that MOCHA-TIMIT has a Laryngograph file which will allow more > accurate estimation of pitch than would be expected from the recording > of the voice. Does TIMIT include this information too. If not, what > methods are people using to estimate the accuracy of pitch using this > data.
Unfortunately there is no "truth" when it comes to F0, the boundaries betweening voiced and unvoiced areas are not fully defined, so people have to trust some hand labelled/corrected databases for measurement. I believe that the paper on the F0 extraction algorithm YIN by Alain de Cheveigne and Hideki Kawahara has details of test databases for F0 extraction (google will find it). I should add we also distribute a TIMIT database with Laryngograph data http://www.festvox.org/dbs/dbs_kdt.html, which unlike MOCHA's UK English databases this one is US English. Alan Alan W Black email: awb@cs.cmu.edu Language Technologies Institute http://www.cs.cmu.edu/~awb/ Carnegie Mellon University tel: +1-412-268-6299 5000 Forbes Ave, Pittsburgh PA, 15213, USA. fax: +1-412-268-6298
> > Cheers, > > Ross-c