DSPRelated.com
Forums

Music tones detection

Started by Andrea June 16, 2004
Hello,
please excuse me in advance if this topic has already been posted, I
wasn't able to find it.

Let's assume I have a wav file which contains a non-overlapped
sequence of the notes of the scale (A with 440Hz, B with 466Hz, B#
494Hz... G# 830), each one with a fix duration of for ex. 1s.

What I'm trying to do is the following: I need to analyse the "song"
and determine which note is being played at different times. I know I
have to look at the frequency spectrum, but I don't know how...

1. Shall I use DFT (which takes complex numbers when I only have real
   ones) or shall I use another transform (for ex. DCT)?

2. Do I have to perform the transform several times using different
   windows with different sizes (for ex. if the sampled file has
   44100Hz, I have to perform the transform the first time with a
   window of 44100/440=100 samples and look for the first harmonic,
   a second time with 44100/466=95 samples, an so on...)?

3. If the point 2 is correct (i.e., I have to perform the transform
   several times using different window sizes), how can I speed up
   the computation? If I can't choose arbitrarily the sample set,
   how can I use FTT that takes only data with a "pow2 size"?

Thanks a lot
Ciaccia
Ciao Andrea

 I would suggest you to consult the papers about automatic music
transcription written by

Bello and Monti of Queen Mary University, London
http://www2.elec.qmul.ac.uk/~juan/
and
Klapuri of Tampere University of Tech.
http://www.cs.tut.fi/~klap/iiro/index.html

Best regards
Massimiliano Tonelli




"Andrea" <ciaccia@gmail.com> ha scritto nel messaggio
news:a3d3d6e4.0406160739.1009206a@posting.google.com...
> Hello, > please excuse me in advance if this topic has already been posted, I > wasn't able to find it. > > Let's assume I have a wav file which contains a non-overlapped > sequence of the notes of the scale (A with 440Hz, B with 466Hz, B# > 494Hz... G# 830), each one with a fix duration of for ex. 1s. > > What I'm trying to do is the following: I need to analyse the "song" > and determine which note is being played at different times. I know I > have to look at the frequency spectrum, but I don't know how... > > 1. Shall I use DFT (which takes complex numbers when I only have real > ones) or shall I use another transform (for ex. DCT)? > > 2. Do I have to perform the transform several times using different > windows with different sizes (for ex. if the sampled file has > 44100Hz, I have to perform the transform the first time with a > window of 44100/440=100 samples and look for the first harmonic, > a second time with 44100/466=95 samples, an so on...)? > > 3. If the point 2 is correct (i.e., I have to perform the transform > several times using different window sizes), how can I speed up > the computation? If I can't choose arbitrarily the sample set, > how can I use FTT that takes only data with a "pow2 size"? > > Thanks a lot > Ciaccia
"Massimiliano Tonelli" <tonelli@anwida.com> wrote:

>Ciao Andrea > > I would suggest you to consult the papers about automatic music >transcription written by > >Bello and Monti of Queen Mary University, London >http://www2.elec.qmul.ac.uk/~juan/ >and >Klapuri of Tampere University of Tech. >http://www.cs.tut.fi/~klap/iiro/index.html > >Best regards >Massimiliano Tonelli
Great pages, and great research. Not surprising that some of it is being done in London, since I know they're doing a ton of great audio and DSP related stuff there ( I would guess, that after the US, England has the next most developed DSP industry, and use of DSP engineers ). I especially found Klapuri's general remark about polyphonic transcription interesting: http://www.cs.tut.fi/~klap/iiro/overview2001/problem.html I made some attempts in this area, using frequency transforms, and had some luck with single note compositions. So, I try to keep up on some of the latest research and literature. And his remark seemed to sum up well the general impressions I've recieved. Regards, Robert www.gldsp.com ( modify address for return email ) www.numbersusa.com www.americanpatrol.com
ciaccia@gmail.com (Andrea) wrote:

>Hello, >please excuse me in advance if this topic has already been posted, I >wasn't able to find it. > >Let's assume I have a wav file which contains a non-overlapped >sequence of the notes of the scale (A with 440Hz, B with 466Hz, B# >494Hz... G# 830), each one with a fix duration of for ex. 1s. > >What I'm trying to do is the following: I need to analyse the "song" >and determine which note is being played at different times. I know I >have to look at the frequency spectrum, but I don't know how... > >1. Shall I use DFT (which takes complex numbers when I only have real > ones) or shall I use another transform (for ex. DCT)?
I haven't used the DCT much, but the DFT or FFT might be a start. There are versions of them that will take in real numbers and provide complex results. Or you can just make the imaginary part zeros for the input.
> >3. If the point 2 is correct (i.e., I have to perform the transform > several times using different window sizes), how can I speed up > the computation? If I can't choose arbitrarily the sample set, > how can I use FTT that takes only data with a "pow2 size"?
You can "zero-pad" by filling in the end of your samples with zeros, until it matches the power-of-2 size. Regards, Robert www.gldsp.com ( modify address for return email ) www.numbersusa.com www.americanpatrol.com
I'd say this would be swatting flies with a hammer. ;-) From what I
assume from Andreas' post he is dealing with musically monophonic
signals. That case is much simpler than the polyphonic case (for which
there is still no general "state of the art", robust pitch estimation
method that I am aware of).

Andrea, a good place to start is looking for literature on pitch
estimation using the short time autocorrelation of a signal. That
should get you started. Rabiner & Gold "Digital Processing of Speech
Signals" covers this, for example. Although it's a bit old it's still
good reading and a valuable resource.

Picking around in the spectrum is a rather nasty thing to do, and
usually not required for musically monophonic signals ;-)

--smb

"Massimiliano Tonelli" <tonelli@anwida.com> wrote in message news:<ya0Ac.52289$zm5.27038@nntpserver.swip.net>...
> Ciao Andrea > > I would suggest you to consult the papers about automatic music > transcription written by > > Bello and Monti of Queen Mary University, London > http://www2.elec.qmul.ac.uk/~juan/ > and > Klapuri of Tampere University of Tech. > http://www.cs.tut.fi/~klap/iiro/index.html > > Best regards > Massimiliano Tonelli > > > > > "Andrea" <ciaccia@gmail.com> ha scritto nel messaggio > news:a3d3d6e4.0406160739.1009206a@posting.google.com... > > Hello, > > please excuse me in advance if this topic has already been posted, I > > wasn't able to find it. > > > > Let's assume I have a wav file which contains a non-overlapped > > sequence of the notes of the scale (A with 440Hz, B with 466Hz, B# > > 494Hz... G# 830), each one with a fix duration of for ex. 1s. > > > > What I'm trying to do is the following: I need to analyse the "song" > > and determine which note is being played at different times. I know I > > have to look at the frequency spectrum, but I don't know how... > > > > 1. Shall I use DFT (which takes complex numbers when I only have real > > ones) or shall I use another transform (for ex. DCT)? > > > > 2. Do I have to perform the transform several times using different > > windows with different sizes (for ex. if the sampled file has > > 44100Hz, I have to perform the transform the first time with a > > window of 44100/440=100 samples and look for the first harmonic, > > a second time with 44100/466=95 samples, an so on...)? > > > > 3. If the point 2 is correct (i.e., I have to perform the transform > > several times using different window sizes), how can I speed up > > the computation? If I can't choose arbitrarily the sample set, > > how can I use FTT that takes only data with a "pow2 size"? > > > > Thanks a lot > > Ciaccia
ciaccia@gmail.com (Andrea) wrote in message news:<a3d3d6e4.0406160739.1009206a@posting.google.com>...
> Hello, > please excuse me in advance if this topic has already been posted, I > wasn't able to find it. > > Let's assume I have a wav file which contains a non-overlapped > sequence of the notes of the scale (A with 440Hz, B with 466Hz, B# > 494Hz... G# 830), each one with a fix duration of for ex. 1s. > > What I'm trying to do is the following: I need to analyse the "song" > and determine which note is being played at different times. I know I > have to look at the frequency spectrum, but I don't know how...
If you have access to matlab and signal processing toolbox, you might want to check out the SPECGRAM function. Rune
On 16 Jun 2004 22:34:26 -0700, stephan.bernsee@web.de (Stephan M.
Bernsee) wrote:

>I'd say this would be swatting flies with a hammer. ;-) From what I >assume from Andreas' post he is dealing with musically monophonic >signals. That case is much simpler than the polyphonic case (for which >there is still no general "state of the art", robust pitch estimation >method that I am aware of). > >Andrea, a good place to start is looking for literature on pitch >estimation using the short time autocorrelation of a signal. That >should get you started. Rabiner & Gold "Digital Processing of Speech >Signals" covers this, for example. Although it's a bit old it's still >good reading and a valuable resource. > >Picking around in the spectrum is a rather nasty thing to do, and >usually not required for musically monophonic signals ;-)
Some DSP technique may be the way to go (especially if there are special considerations for fast detection time and accuracy), but it's easy enough to detect pitch the 'crude' way: look for zero crossings (preferably consecutive ones going the same direction, so you're always looking at a full cycle of the input signal). Measuring the time between them gives the period, and of course the reciprocal gets the frequency. A best-match to a table of the frequencies of musical notes will tell you which note is playing. Even a DSP is overkill here, you can have the analog signal go to a comparator and to a microcontroller's interrupt pin. Each interrupt saves the value of a running timer (a 1MHz clock gives good resolution), and the period is calculated by the current timer value minus the previous timer value. Instead of doing a reciprocal for frequency look-up, just make a table based on periods of notes of the chromatic scale. Can you tell I've thought this through? :)
>--smb
----- http://mindspring.com/~benbradley
I'm not convinced :-)

Zero crossing detection will fail to detect the correct pitch if the
fundamental frequency is weak or not present in the sound. Depending
on the instrument, that might or might not be an issue for Andrea's
application. It certainly is for voice and instruments like saxophone.

--smb

Ben Bradley wrote:
> Some DSP technique may be the way to go (especially if there are > special considerations for fast detection time and accuracy), but it's > easy enough to detect pitch the 'crude' way: look for zero crossings > (preferably consecutive ones going the same direction, so you're > always looking at a full cycle of the input signal). Measuring the > time between them gives the period, and of course the reciprocal gets > the frequency. A best-match to a table of the frequencies of musical > notes will tell you which note is playing. > Even a DSP is overkill here, you can have the analog signal go to a > comparator and to a microcontroller's interrupt pin. Each interrupt > saves the value of a running timer (a 1MHz clock gives good > resolution), and the period is calculated by the current timer value > minus the previous timer value. Instead of doing a reciprocal for > frequency look-up, just make a table based on periods of notes of the > chromatic scale. > > Can you tell I've thought this through? :) > > >--smb > > ----- > http://mindspring.com/~benbradley