Reply by James Salsman July 24, 20052005-07-24
>... had to write a FFT pitch detection thing in matlab. Actually one would > question whether it would work with something other than just phone tones > (eg to automatically determine each of the guitar or piano pitches in any > song on a CD even if more than one note at once and with harmonics etc)....
The problem with "pitch detection" is that most voiced speech is essentially two-note chords. Is the pitch of a vowel its high or its low component? Most pitch tracking algorithms try to pick the fundamental, low note, but that is not entirely satisfying, in that some transitions lead to awkward transitions when the low component fades out and the high component becomes the low one. Sincerely, James Salsman -- www.readsay.com - maker of the ReadSay PROnounce English literacy system 400 MHz PDA included: $499 -- http://www.readsay.com/PROnounce.html
Reply by Jevan July 16, 20052005-07-16

Well it was along time ago when I did signals&systems at anu (2000) and we
had to write a FFT pitch detection thing in matlab. Actually one would
question whether it would work with something other than just phone tones
(eg to automatically determine each of the guitar or piano pitches in any
song on a CD even if more than one note at once and with harmonics etc).

A suggestion I saw from you is to take the waveform, shift it, and then
subtract it from itself. If the waveform was a sine wave of 5hz and you
shifted it by a period of 1/5 then you would get exactly zero (for example).
So you can see that this is a simple way of determining frequency
components, just by trying different shifts. The resulting values would not
be 'fourier coefficients' they would be some other kind of set of
coefficients.

Presumably then you would have to built a template library of coefficients
(within some acceptable kind of variance) that correspond with expected
response for a particular instrument at a particular pitch. You'd need to
subtract the "guitar-A1" template from it, then subtract the "piano-C3"
template, then subtract the "Drum-Kick" template (each when one thinks it
was played), then see what was left. It shouldn't matter the order you do
the subtracting in, if you template sound is good enough (it must include
the coefficents for the resonant frequencies as well), just like it doesn't
matter when we listen to music as humans whether we isolate the guitar,
piano, drum or whatever sound to see what its doing, can be done just by
changing our focus.

I'm sure its probably already been done before though.

Even if the computer can notate perfectly from CD for you, they still need
musicians for the "human aspect".. same goes for engineers etc.. since
software can invent things for us (remember the article I sent you about a
software which invented the last 10 years worth of electronic inventions
made by engineers by using a partly "trial and error" algorithm with
choosing & connecting electonic parts given known responses (behaviours) for
the parts, to automatically reach a particular system response specification
which corresponded with the discovery having been made).. but we still need
something to do thus we would train humans still.

JP


"robert bristow-johnson" <rbj@audioimagination.com> wrote in message
news:BE7887DE.5F02%rbj@audioimagination.com...
> in article 4252FD0C.BDDBEDA9@mega-nerd.com, Erik de Castro Lopo at > nospam@mega-nerd.com wrote on 04/05/2005 17:03: > > > dt@soundmathtech.com wrote: > >> > >> Didn't I tell everybody here that pitch detection problem is solved ? > >> > >> It seems that some people are unable to learn, or they are deliberatly > >> trying to mislead general public asking questions here. > >> > >> Go to http://www.soundmathtech.com/p&#4294967295;itch for more information. > >> > >> Also, US Patent Application at http://www.uspto.gov/patft > >> (Pub. No. 20030088401) > > > > Thats one solution. My guess is that there are at least 10 other > > possible solutions, at least one of which is better than whatever > > is covered by that patent. > > i'm gonna download the MATLAB stuff and see if i can try it out. > > long ago (about 2002 when it came out) i took a good look at the ICASSP > paper Dmitry published about this method, and concluded that there were
many
> elements of the AMDF function in it. it was jazzed up in some ways, but
it
> came down to subtracting shifted versions of the waveform from itself, > applying the magnitude, subtracting that magnitude difference from some > parameter (that inverts the measure), applying the unit step function, > adding up a few of these (i thought it was exceedingly small number, like
3
> terms) and then doing some kind of statistical histogram thing and looking > for a maximum. Dmitry has denied that it's merely a jazzed-up AMDF and
has
> challenged me to produce some waveform that fools it (i said that one
could
> since the unit step function throws away information and that can be > exploited to fool the algorithm). i probably should put some time into > answering that challenge but i don't have the time (it would take hours to > days to really research the thing and i ain't a grad student anymore). > > anyway to Dmitry: i think you underestimate some of the expertise in
pitch
> detection in both of these newsgroups. some of us have a great deal of > experience in it and some theoretical expertise also. the overt
confidence
> in your W.C. Fields sales technique might not be warranted. > > -- > > r b-j rbj@audioimagination.com > > "Imagination is more important than knowledge." > >
Reply by Ronald H. Nicholson Jr. April 13, 20052005-04-13
In article <Ov-dnWevlbH-csffRVn-oQ@rcn.net>, Jerry Avins  <jya@ieee.org> wrote:
>> I'm mostly interested in pitch detection for the stringed instrument >> tuning problem (including very cheap pianos, since that's what I have), >> where one must deal with missing fundamentals, > >Missing fundamental? In a piano? I doubt it, no matter how cheap.
I should have added that the microphone used for pitch detection was pretty cheap also. If the low octave fundamantal frequencies weren't missing, they seemed pretty well buried in roll-off loss, and room noise. IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.
Reply by Jerry Avins April 11, 20052005-04-11
Ronald H. Nicholson Jr. wrote:
> In article <BE7EFA90.6132%rbj@audioimagination.com>, > robert bristow-johnson <rbj@audioimagination.com> wrote: > >>i've been doing stuff about pitch detection and pitch >>shifting since 1989 and in 1992 some of my work ended up in a pretty high >>end commercial product and some others since then > > > Interesting stuff. What is the figure of merit for algorithms used > for your pitch detecting/shifting work? > > I'm mostly interested in pitch detection for the stringed instrument > tuning problem (including very cheap pianos, since that's what I have), > where one must deal with missing fundamentals,
Missing fundamental? In a piano? I doubt it, no matter how cheap.
> not only deal with but > potentially measure the inharmonicity, as well as measure to a tuning > accuracy sufficient to determine the tuning used (Equal temperament versus > Just temperament in some key, etc.) and the amount of "stretch" present > (with a figure of merit in cents = 2^(1/1200)), while also making a quick > and decent probability guess at which note was just played (similar to > a very simplified form of the music transcription problem) as a human > interface assist in auto-configuring the tuner. > > One of the best jobs of tuning my cheap spinet piano was done by a blind > gent who was mostly, I think, just listening for beats between various > overtones of simultaneously played notes.
That's how every good tuner does it. It's also the reason for stretch. I wish Igor Kimpnis were still alive. He was a whiz at doing it who could also lucidly explain it all.
> I couldn't hear them well > enough, so I eventually wrote some DSP & visualization software for > my Mac and PalmPilot so that I could see them (e.g. 'scope the various > harmonics of one string synced to differing harmonics of another string > and/or to a calibrated tuning reference, etc.).
That sounds great. Lots of luck! (I'm sure there are lots of sites like http://www-scf.usc.edu/~chinghuc/pitch_detection_algorithms.htm around. Have you seen some?) Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Reply by robert bristow-johnson April 10, 20052005-04-10
in article d3c7bu$9t5$1@blue.rahul.net, Ronald H. Nicholson Jr. at
rhn@mauve.rahul.net wrote on 04/10/2005 17:55:

> What is the figure of merit for algorithms used > for your pitch detecting/shifting work?
it's mostly subjective. if the pitch detector output is connected to a simple single-tone synthesizer, how does the output tone sound given the input (from some other instrument)? does it follow the input pitch well, even when it varies? does it make octave errors? what's the pitch acquisition time or throughput delay? other than the degree of vibrato (how fast and how far the pitch deviates) that can be tracked, i don't think i've quantitatively made any figure of merit. some of these issues go away for non-real-time pitch detection. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by Ronald H. Nicholson Jr. April 10, 20052005-04-10
In article <070420052152265179%x@x.x>, ben  <x@x.x> wrote:
>... -- so what's the difference? the frequency of >spikes make a pitch right? lots of spikes per second -- high pitched >sound. sure, the fart sample is a bit rougher and maybe gappier but >it's still the same situation isn't it? i'm definetely not seeing the >difference between frequency and pitch but i don't think i'm that >fussed about the difference (although it is very interesting -- >certainly got me thinking). maybe it is important though, i'm not sure.
Lots of frequencies might sum together to something that is recognized by the human ear/brain as just a single pitch, so there is not a one-to-one relationship. These frequencies might not be exactly harmonically related (e.g. the distance between the sound spikes might not quite be repeating in a local pattern). The recognized pitch may be for a frequency that is not at all present in the sound waveform (you can put a notch filter on the frequency of a low piano key, and you'll still think the pitch was that of that same low key). IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.
Reply by Ronald H. Nicholson Jr. April 10, 20052005-04-10
In article <BE7EFA90.6132%rbj@audioimagination.com>,
robert bristow-johnson  <rbj@audioimagination.com> wrote:
>i've been doing stuff about pitch detection and pitch >shifting since 1989 and in 1992 some of my work ended up in a pretty high >end commercial product and some others since then
Interesting stuff. What is the figure of merit for algorithms used for your pitch detecting/shifting work? I'm mostly interested in pitch detection for the stringed instrument tuning problem (including very cheap pianos, since that's what I have), where one must deal with missing fundamentals, not only deal with but potentially measure the inharmonicity, as well as measure to a tuning accuracy sufficient to determine the tuning used (Equal temperament versus Just temperament in some key, etc.) and the amount of "stretch" present (with a figure of merit in cents = 2^(1/1200)), while also making a quick and decent probability guess at which note was just played (similar to a very simplified form of the music transcription problem) as a human interface assist in auto-configuring the tuner. One of the best jobs of tuning my cheap spinet piano was done by a blind gent who was mostly, I think, just listening for beats between various overtones of simultaneously played notes. I couldn't hear them well enough, so I eventually wrote some DSP & visualization software for my Mac and PalmPilot so that I could see them (e.g. 'scope the various harmonics of one string synced to differing harmonics of another string and/or to a calibrated tuning reference, etc.). IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.
Reply by robert bristow-johnson April 10, 20052005-04-10
in article d3bqle$msd$1@blue.rahul.net, Ronald H. Nicholson Jr. at
rhn@mauve.rahul.net wrote on 04/10/2005 14:18:

> In article <BE7E289E.610D%rbj@audioimagination.com>, > robert bristow-johnson <rbj@audioimagination.com> wrote: >> ... the AMDF or ASDF will find the best fit for the period, which is >> influenced by all of the harmonics, and the harmonics greater in amplitude >> will influence the measure more. the reciprocal of that would be called the >> fundamental frequency, but it might not be exactly the same frequency as the >> 1st harmonic. as in the case above, if there was zero amplitude at 109.8 (i >> dunno what meaning that precise frequency would have) but a decent amount of >> energy at 220, 330.3, 440.8, 551.5, the AMDF will not measure a period of >> 1/109.8, but will be shorter than 1/110 because of the other harmonics. > ... >> if you have a good (and short) sound file of >> a note or even just a collection of amplitudes and frequencies that you >> think would fool this, i might want to try it with a MATLAB kludge to see >> if it does. > > Your above example might work. With the spectral peak at 220. But > I wouldn't use zero dB at 109.8, maybe some amount well under whatever > the telco roll-off below 300 Hz is instead.
you mean zero linear amplitude at 109.8, i believe. 0 dB is a relative measure that can't include 0 linear amplitude. "0 dB" might mean full-scale. anyway, Ronald, i've been doing stuff about pitch detection and pitch shifting since 1989 and in 1992 some of my work ended up in a pretty high end commercial product and some others since then (if you wanna know who, i'll name-drop in a private e-mail). i have some good experience of what works and some other experience on what doesn't. i don't have any patents (except one stupid one from Fostex) with my name on it (because all of these companies have been relying on trade-secret, a line i haven't been crossing since AMDF and ASDF are in the literature) but there are some good patents to look at and see what some of my competitors have done. go to the USPTO and check up on "Brian Gibson" of IVL (they have a search facility) and you can see some nice patents on pitch detection and pitch shifting. some of them were patenting the obvious, but that was in his business plan also. there is also some clever little ideas that he scooped me on. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by Ronald H. Nicholson Jr. April 10, 20052005-04-10
In article <BE7E289E.610D%rbj@audioimagination.com>,
robert bristow-johnson  <rbj@audioimagination.com> wrote:
>... the AMDF or ASDF will find the best fit for the period, which is >influenced by all of the harmonics, and the harmonics greater in amplitude >will influence the measure more. the reciprocal of that would be called the >fundamental frequency, but it might not be exactly the same frequency as the >1st harmonic. as in the case above, if there was zero amplitude at 109.8 (i >dunno what meaning that precise frequency would have) but a decent amount of >energy at 220, 330.3, 440.8, 551.5, the AMDF will not measure a period of >1/109.8, but will be shorter than 1/110 because of the other harmonics.
...
>if you have a good (and short) sound file of >a note or even just a collection of amplitudes and frequencies that you >think would fool this, i might want to try it with a MATLAB kludge to see >if it does.
Your above example might work. With the spectral peak at 220. But I wouldn't use zero dB at 109.8, maybe some amount well under whatever the telco roll-off below 300 Hz is instead. IMHO. YMMV. -- Ron Nicholson rhn AT nicholson DOT com http://www.nicholson.com/rhn/ #include <canonical.disclaimer> // only my own opinions, etc.
Reply by robert bristow-johnson April 10, 20052005-04-10
in article d39gmo$vqr$1@blue.rahul.net, Ronald H. Nicholson Jr. at
rhn@mauve.rahul.net wrote on 04/09/2005 17:16:
 
> If the spectral energy peaks are very closely but not exactly harmonically > related (which the physics of some real-world resonators can produce), > a sub-multiple of the lowest frequency might be what a human would call > the approximate pitch, but a sub-multiple of an even higher frequency > present might be what a musician would call the exact pitch relative to > other simultaneous musical notes present. > > A good example might be a spinet piano, where a slightly flat low-A > (say 109.8 Hz)
what is 109.8 Hz? is it the frequency of the bottom overtone (often called the fundamental)? or is it the reciprocal of the period? especially in the situation you describe below, they are not exactly the same thing. the AMDF or ASDF measures the period.
> played through a teleco quality circuit might only have > frequency content above 200 Hz, but would still be heard as a low-A, > two octaves below concert-A, in appropriate context, even with little > spectral energy in that range.
yup. and the measured period will be about 1000/109.8 milliseconds. but possibly not exactly.
> But if the near 4th harmonic peaked at 440.8 Hz,
you mean there's a formant (or resonance) at around 440 Hz making the 4th harmonic particularly loud compared to others? that will increase its influence on the measured period.
> and this waveform was played against a simultaneous exact 440 > Hz concert-A flute tone, thus producing a noticeable beat, the low-A > piano note might be perceived as slightly #sharp in pitch, not flat.
that may be true, but i am not sure that the AMDF will see it any differently. especially if the 109.8 Hz component was killed by an HPF, then the period *will* be determined as the greatest common factor of the remaining harmonics and if they are sharper than their integer harmonic index times the 109.8 Hz component, the AMDF will arrive at a pitch that is higher than 109.8.
> Humans may also be more sensitive to pitch errors in the middle of > a the audio spectrum, versus in the lower or higher frequency ranges.
that may be, but is still not the issue. just like for a VU meter, you could run the audio through something like an A-weighting filter to emphasize frequency components in the 2 to 5 kHz range and de-emphasize components in the highest and lowest octaves before the AMDF algorithm see it.
> Thus the pitch in the above situation, to a piano tuner, might be best > considered as closer to 440.8/4 = 110.2 Hz, and neither, say, at 220 Hz, > where there might be the highest absolute spectral peak (according to > an FFT maxima), nor at the fundamental 109.8 Hz string resonance that > started off this overtone sequence (and which an AMDF or autocorrelation > algorithm might hunt and find).
no. the AMDF or ASDF will find the best fit for the period, which is influenced by all of the harmonics, and the harmonics greater in amplitude will influence the measure more. the reciprocal of that would be called the fundamental frequency, but it might not be exactly the same frequency as the 1st harmonic. as in the case above, if there was zero amplitude at 109.8 (i dunno what meaning that precise frequency would have) but a decent amount of energy at 220, 330.3, 440.8, 551.5, the AMDF will not measure a period of 1/109.8, but will be shorter than 1/110 because of the other harmonics. i know about sharpened harmonics in many fixed string instruments with increasing harmonic number (due to stiffness at the string termination that effectively shortens the string, particularly for high amplitude hits). i know that piano tuners may very well tune higher notes slightly sharp, in comparison to their mathematical value in an equally tempered scale to line up octaves to power of 2 harmonics from lower notes. for 12 note/octave equal temperament, we don't line up the other harmonics, say the 3rd to exactly 19 semitones up because 3 does not exactly equal 2^(19/12). i know about some tones possibly having missing fundamental (and possibly other harmonics). it's also possible, that the fundamental, even when it is there, does not exactly equal the reciprocal of the measured period, because of the aggregate influence of the other harmonics. that doesn't change anything. for a tonal musical note, they are quasi-periodic and, for those kinds of notes, our most salient queue for pitch will the reciprocal of the period and the AMDF or ASDF is designed to best estimate that period. now there are problems. there is the classic "octave problem" (but it could be with other harmonic intervals, too, but most often, if there is an ambiguity, it's about an octave). this come from the fact that a 110 Hz note that is added to a *very* quiet 55 Hz note (say, at -80 dB relative to the 110 Hz note), will look like a 55 Hz note mathematically, but will sound like a 110 Hz note. then there needs to be a little brains built into the AMDF analysis to reject the null at 1/55 sec just because it is ever so slightly lower than the null at 1/110 sec. so somehow you want to choose the first really good looking null, even if the null at twice the lag is very slightly better. that's the main problem with AMDF or ASDF. i don't see the situation you described as being a problem. if you have a good (and short) sound file of a note or even just a collection of amplitudes and frequencies that you think would fool this, i might want to try it with a MATLAB kludge to see if it does. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."