comp.dsp | Music tones detection

Hello,
please excuse me in advance if this topic has already been posted, I
wasn't able to find it.

Let's assume I have a wav file which contains a non-overlapped
sequence of the notes of the scale (A with 440Hz, B with 466Hz, B#
494Hz... G# 830), each one with a fix duration of for ex. 1s.

What I'm trying to do is the following: I need to analyse the "song"
and determine which note is being played at different times. I know I
have to look at the frequency spectrum, but I don't know how...

1. Shall I use DFT (which takes complex numbers when I only have real
   ones) or shall I use another transform (for ex. DCT)?

2. Do I have to perform the transform several times using different
   windows with different sizes (for ex. if the sampled file has
   44100Hz, I have to perform the transform the first time with a
   window of 44100/440=100 samples and look for the first harmonic,
   a second time with 44100/466=95 samples, an so on...)?

3. If the point 2 is correct (i.e., I have to perform the transform
   several times using different window sizes), how can I speed up
   the computation? If I can't choose arbitrarily the sample set,
   how can I use FTT that takes only data with a "pow2 size"?

Thanks a lot
Ciaccia

Reply by Massimiliano Tonelli ●June 16, 20042004-06-16

Ciao Andrea

 I would suggest you to consult the papers about automatic music
transcription written by

Bello and Monti of Queen Mary University, London
http://www2.elec.qmul.ac.uk/~juan/
and
Klapuri of Tampere University of Tech.
http://www.cs.tut.fi/~klap/iiro/index.html

Best regards
Massimiliano Tonelli




"Andrea" <ciaccia@gmail.com> ha scritto nel messaggio
news:a3d3d6e4.0406160739.1009206a@posting.google.com...
> Hello,
> please excuse me in advance if this topic has already been posted, I
> wasn't able to find it.
>
> Let's assume I have a wav file which contains a non-overlapped
> sequence of the notes of the scale (A with 440Hz, B with 466Hz, B#
> 494Hz... G# 830), each one with a fix duration of for ex. 1s.
>
> What I'm trying to do is the following: I need to analyse the "song"
> and determine which note is being played at different times. I know I
> have to look at the frequency spectrum, but I don't know how...
>
> 1. Shall I use DFT (which takes complex numbers when I only have real
>    ones) or shall I use another transform (for ex. DCT)?
>
> 2. Do I have to perform the transform several times using different
>    windows with different sizes (for ex. if the sampled file has
>    44100Hz, I have to perform the transform the first time with a
>    window of 44100/440=100 samples and look for the first harmonic,
>    a second time with 44100/466=95 samples, an so on...)?
>
> 3. If the point 2 is correct (i.e., I have to perform the transform
>    several times using different window sizes), how can I speed up
>    the computation? If I can't choose arbitrarily the sample set,
>    how can I use FTT that takes only data with a "pow2 size"?
>
> Thanks a lot
> Ciaccia

Reply by ●June 16, 20042004-06-16

"Massimiliano Tonelli" <tonelli@anwida.com> wrote:

>Ciao Andrea
>
> I would suggest you to consult the papers about automatic music
>transcription written by
>
>Bello and Monti of Queen Mary University, London
>http://www2.elec.qmul.ac.uk/~juan/
>and
>Klapuri of Tampere University of Tech.
>http://www.cs.tut.fi/~klap/iiro/index.html
>
>Best regards
>Massimiliano Tonelli

Great pages, and great research.  Not surprising that some of it is
being done in London, since I know they're doing a ton of great audio
and DSP related stuff there ( I would guess, that after the US,
England has the next most developed DSP industry, and use of DSP
engineers ). 

I especially found Klapuri's general remark about polyphonic
transcription interesting:

http://www.cs.tut.fi/~klap/iiro/overview2001/problem.html

I made some attempts in this area, using frequency transforms, and had
some luck with single note compositions.  So, I try to keep up on some
of the latest research and literature.  And his remark seemed to sum
up well the general impressions I've recieved.

Regards,

Robert

www.gldsp.com

( modify address for return email )

www.numbersusa.com
www.americanpatrol.com

Reply by ●June 17, 20042004-06-17

ciaccia@gmail.com (Andrea) wrote:

>Hello,
>please excuse me in advance if this topic has already been posted, I
>wasn't able to find it.
>
>Let's assume I have a wav file which contains a non-overlapped
>sequence of the notes of the scale (A with 440Hz, B with 466Hz, B#
>494Hz... G# 830), each one with a fix duration of for ex. 1s.
>
>What I'm trying to do is the following: I need to analyse the "song"
>and determine which note is being played at different times. I know I
>have to look at the frequency spectrum, but I don't know how...
>
>1. Shall I use DFT (which takes complex numbers when I only have real
>   ones) or shall I use another transform (for ex. DCT)?

I haven't used the DCT much, but the DFT or FFT might be a start.
There are versions of them that will take in real numbers and provide
complex results.  Or you can just make the imaginary part zeros for
the input.

>
>3. If the point 2 is correct (i.e., I have to perform the transform
>   several times using different window sizes), how can I speed up
>   the computation? If I can't choose arbitrarily the sample set,
>   how can I use FTT that takes only data with a "pow2 size"?

You can "zero-pad" by filling in the end of your samples with zeros,
until it matches the power-of-2 size.

Regards,

Robert

www.gldsp.com

( modify address for return email )

www.numbersusa.com
www.americanpatrol.com

Reply by Stephan M. Bernsee ●June 17, 20042004-06-17

I'd say this would be swatting flies with a hammer. ;-) From what I
assume from Andreas' post he is dealing with musically monophonic
signals. That case is much simpler than the polyphonic case (for which
there is still no general "state of the art", robust pitch estimation
method that I am aware of).

Andrea, a good place to start is looking for literature on pitch
estimation using the short time autocorrelation of a signal. That
should get you started. Rabiner & Gold "Digital Processing of Speech
Signals" covers this, for example. Although it's a bit old it's still
good reading and a valuable resource.

Picking around in the spectrum is a rather nasty thing to do, and
usually not required for musically monophonic signals ;-)

--smb

"Massimiliano Tonelli" <tonelli@anwida.com> wrote in message news:<ya0Ac.52289$zm5.27038@nntpserver.swip.net>...
> Ciao Andrea
> 
>  I would suggest you to consult the papers about automatic music
> transcription written by
> 
> Bello and Monti of Queen Mary University, London
> http://www2.elec.qmul.ac.uk/~juan/
> and
> Klapuri of Tampere University of Tech.
> http://www.cs.tut.fi/~klap/iiro/index.html
> 
> Best regards
> Massimiliano Tonelli
> 
> 
> 
> 
> "Andrea" <ciaccia@gmail.com> ha scritto nel messaggio
> news:a3d3d6e4.0406160739.1009206a@posting.google.com...
> > Hello,
> > please excuse me in advance if this topic has already been posted, I
> > wasn't able to find it.
> >
> > Let's assume I have a wav file which contains a non-overlapped
> > sequence of the notes of the scale (A with 440Hz, B with 466Hz, B#
> > 494Hz... G# 830), each one with a fix duration of for ex. 1s.
> >
> > What I'm trying to do is the following: I need to analyse the "song"
> > and determine which note is being played at different times. I know I
> > have to look at the frequency spectrum, but I don't know how...
> >
> > 1. Shall I use DFT (which takes complex numbers when I only have real
> >    ones) or shall I use another transform (for ex. DCT)?
> >
> > 2. Do I have to perform the transform several times using different
> >    windows with different sizes (for ex. if the sampled file has
> >    44100Hz, I have to perform the transform the first time with a
> >    window of 44100/440=100 samples and look for the first harmonic,
> >    a second time with 44100/466=95 samples, an so on...)?
> >
> > 3. If the point 2 is correct (i.e., I have to perform the transform
> >    several times using different window sizes), how can I speed up
> >    the computation? If I can't choose arbitrarily the sample set,
> >    how can I use FTT that takes only data with a "pow2 size"?
> >
> > Thanks a lot
> > Ciaccia

Reply by Rune Allnor ●June 17, 20042004-06-17

ciaccia@gmail.com (Andrea) wrote in message news:<a3d3d6e4.0406160739.1009206a@posting.google.com>...
> Hello,
> please excuse me in advance if this topic has already been posted, I
> wasn't able to find it.
> 
> Let's assume I have a wav file which contains a non-overlapped
> sequence of the notes of the scale (A with 440Hz, B with 466Hz, B#
> 494Hz... G# 830), each one with a fix duration of for ex. 1s.
> 
> What I'm trying to do is the following: I need to analyse the "song"
> and determine which note is being played at different times. I know I
> have to look at the frequency spectrum, but I don't know how...

If you have access to matlab and signal processing toolbox, 
you might want to check out the SPECGRAM function.

Rune

Reply by Ben Bradley ●June 19, 20042004-06-19

On 16 Jun 2004 22:34:26 -0700, stephan.bernsee@web.de (Stephan M.
Bernsee) wrote:

>I'd say this would be swatting flies with a hammer. ;-) From what I
>assume from Andreas' post he is dealing with musically monophonic
>signals. That case is much simpler than the polyphonic case (for which
>there is still no general "state of the art", robust pitch estimation
>method that I am aware of).
>
>Andrea, a good place to start is looking for literature on pitch
>estimation using the short time autocorrelation of a signal. That
>should get you started. Rabiner & Gold "Digital Processing of Speech
>Signals" covers this, for example. Although it's a bit old it's still
>good reading and a valuable resource.
>
>Picking around in the spectrum is a rather nasty thing to do, and
>usually not required for musically monophonic signals ;-)

   Some DSP technique may be the way to go (especially if there are
special considerations for fast detection time and accuracy), but it's
easy enough to detect pitch the 'crude' way:  look for zero crossings
(preferably  consecutive ones going the same direction, so you're
always looking at a full cycle of the input signal). Measuring the
time between them gives the period, and of course the reciprocal gets
the frequency. A best-match to a table of the frequencies of musical
notes will tell you which note is playing.
   Even a DSP is overkill here, you can have the analog signal go to a
comparator and to a microcontroller's interrupt pin. Each interrupt
saves the value of a running timer (a 1MHz clock gives good
resolution), and the period is calculated by the current timer value
minus the previous timer value. Instead of doing a reciprocal for
frequency look-up, just make a table based on periods of notes of the
chromatic scale.

   Can you tell I've thought this through? :)

>--smb

-----
http://mindspring.com/~benbradley

Reply by Stephan M. Bernsee ●June 19, 20042004-06-19

I'm not convinced :-)

Zero crossing detection will fail to detect the correct pitch if the
fundamental frequency is weak or not present in the sound. Depending
on the instrument, that might or might not be an issue for Andrea's
application. It certainly is for voice and instruments like saxophone.

--smb

Ben Bradley wrote:
>    Some DSP technique may be the way to go (especially if there are
> special considerations for fast detection time and accuracy), but it's
> easy enough to detect pitch the 'crude' way:  look for zero crossings
> (preferably  consecutive ones going the same direction, so you're
> always looking at a full cycle of the input signal). Measuring the
> time between them gives the period, and of course the reciprocal gets
> the frequency. A best-match to a table of the frequencies of musical
> notes will tell you which note is playing.
>    Even a DSP is overkill here, you can have the analog signal go to a
> comparator and to a microcontroller's interrupt pin. Each interrupt
> saves the value of a running timer (a 1MHz clock gives good
> resolution), and the period is calculated by the current timer value
> minus the previous timer value. Instead of doing a reciprocal for
> frequency look-up, just make a table based on periods of notes of the
> chromatic scale.
> 
>    Can you tell I've thought this through? :)
> 
> >--smb
> 
> -----
> http://mindspring.com/~benbradley

Music tones detection

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group