Components in Audio recognition - Part 1

Prabindh SundaresonNovember 20, 20076 comments

Audio recognition is defined as the task of recognizing a particular piece of audio (could be music, ring-tone, and speech as well), from a given sample set of audio tracks.

The Human Auditory System (HAS) is unique in that the tasks of "familiarisation" of unknown tracks, and finding "similar" tracks come naturally to us. Tunes from the not-so-recent past can still haunt the human brain many years later, when triggered by a similar tune. The way the brain stores and responds to music is proven to be different from the way the brain processes speech and other behaviour. The field of audio recognition tries to emulate this behaviour by using concepts from Biological modeling, Signal Processing theory and Pattern recognition theory.  

Audio recognition systems are used mainly to retrieve similar tracks from a database - this could be for various reasons including copyright management, personal playlist management, etc. A vastly different system that relies on "Social rating" also exists, that depends on peer rating of media files to decide where they belong. This is not covered in this topic, but will be compared when required.

A typical audio recognition system consists of the following components.

  • A system that "stores" the archive of tracks that need to be managed. This could be a simple SQL database indexing files stored in a 100 TB server.
  • A system that "analyses" the archive and fingerprints the characteristics of each track, and form various "groups" or "sets" of track based on their overlapping characteristics. This will typically include components from modeling, signal processing, and pattern recognition fields.
  • A system that can "receive" a audio track that needs to be "placed" into one of the many given groups or sets. This is typically a front-end, that is an User-Interface of some kind, followed by more Signal Processing blocks.

Portable implementations of the above can be created, with smaller storage, and more efficient but limited analysis capabilities and front ends. These can for example be used in portable media players. The Rio Volt had an early implementation of such an interface.

In the next series of articles, we will see how each of these components are typically implemented. We will also look at some reference implementations and discuss why an approach is better or bad. If you have any specific topic to discuss, email me at prabindh a't yahoo a't com.

For those of you looking at a place to start your scholarly searches, start at http://www.music-ir.org/

Looking to receive your feedbacks,


[ - ]
Comment by hinaHNovember 8, 2013
sir my project is musical instrument identification using wavelete...i m having problem to prepare audio source seperation algorithm would u plz sir tell me how to create matlab code for it .thank you
[ - ]
Comment by SteveSmithNovember 24, 2007
Interesting topic! I’m always amazed (and depressed) that my eyes and ears can perform signal processing about a thousand times better than the algorithms I write. Thanks in advance for the articles.
[ - ]
Comment by v.kajenDecember 3, 2009
sir i'm doing password unlocker using audio recognition as my 3rd year dsp project. can you help me to identify mimicry voice signals from true singals? i', confusing at this stage
[ - ]
Comment by jidaFebruary 14, 2011
how amazing!
[ - ]
Comment by jidaFebruary 14, 2011
sir , is i want to make sample program for this, can u please help me build a code for it? thank you,
[ - ]
Comment by ank881October 30, 2012
Hi Prabindh Sundareson, thanks for article.
i am working on implementing a speech codec(ITU based), so can you help me , how to proceed and start in correct directon.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: