Components in Audio recognition - Part 1

Prabindh Sundareson●November 20, 2007●6 comments

Audio recognition is defined as the task of recognizing a particular piece of audio (could be music, ring-tone, and speech as well), from a given sample set of audio tracks.

The Human Auditory System (HAS) is unique in that the tasks of "familiarisation" of unknown tracks, and finding "similar" tracks come naturally to us. Tunes from the not-so-recent past can still haunt the human brain many years later, when triggered by a similar tune. The way the brain stores and responds to music is proven to be different from the way the brain processes speech and other behaviour. The field of audio recognition tries to emulate this behaviour by using concepts from Biological modeling, Signal Processing theory and Pattern recognition theory.

Audio recognition systems are used mainly to retrieve similar tracks from a database - this could be for various reasons including copyright management, personal playlist management, etc. A vastly different system that relies on "Social rating" also exists, that depends on peer rating of media files to decide where they belong. This is not covered in this topic, but will be compared when required.

A typical audio recognition system consists of the following components.

A system that "stores" the archive of tracks that need to be managed. This could be a simple SQL database indexing files stored in a 100 TB server.
A system that "analyses" the archive and fingerprints the characteristics of each track, and form various "groups" or "sets" of track based on their overlapping characteristics. This will typically include components from modeling, signal processing, and pattern recognition fields.
A system that can "receive" a audio track that needs to be "placed" into one of the many given groups or sets. This is typically a front-end, that is an User-Interface of some kind, followed by more Signal Processing blocks.

Portable implementations of the above can be created, with smaller storage, and more efficient but limited analysis capabilities and front ends. These can for example be used in portable media players. The Rio Volt had an early implementation of such an interface.

In the next series of articles, we will see how each of these components are typically implemented. We will also look at some reference implementations and discuss why an approach is better or bad. If you have any specific topic to discuss, email me at prabindh a't yahoo a't com.

For those of you looking at a place to start your scholarly searches, start at http://www.music-ir.org/

Looking to receive your feedbacks,

Prabindh

Comments

Comments
Write a Comment

Select to add a comment

[ - ]

Comment by hinaH●November 8, 2013

sir my project is musical instrument identification using wavelete...i m having problem to prepare audio source seperation algorithm would u plz sir tell me how to create matlab code for it .thank you

[ - ]

Comment by SteveSmith●November 24, 2007

Interesting topic! Iâm always amazed (and depressed) that my eyes and ears can perform signal processing about a thousand times better than the algorithms I write. Thanks in advance for the articles.

[ - ]

Comment by v.kajen●December 3, 2009

sir i'm doing password unlocker using audio recognition as my 3rd year dsp project. can you help me to identify mimicry voice signals from true singals? i', confusing at this stage

[ - ]

Comment by jida●February 14, 2011

how amazing!

[ - ]

Comment by jida●February 14, 2011

sir , is i want to make sample program for this, can u please help me build a code for it? thank you,

[ - ]

Comment by ank881●October 30, 2012

Hi Prabindh Sundareson, thanks for article.
i am working on implementing a speech codec(ITU based), so can you help me , how to proceed and start in correct directon.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers:

Choose a Username

E-Mail (Work, School or ieee)

First Name

Last Name

Employer

Job Title

Country

State

Password

Confirm Password

By checking this box, I agree with the terms of use and privacy policy By checking this box, I consent to receive occasional emails from the *Related sites and their partners. I understand that these emails will only contain relevant information and that I can unsubscribe at any time.

Components in Audio recognition - Part 1

Sign in

About Prabindh Sundareson

Blogs - Hall of Fame

Free PDF Downloads

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group