DSPRelated.com
Forums

vocal/non vocal segmentation using SVM classifier

Started by Rodion October 7, 2008
Hi all,

I'm working on a singer recognition project that identifies singer in a
music recording via Multifeature Statistical Singer Modeling. 

At this stage, I trying to apply vocal/non vocal segmentation using SVM
classifier (Matlab - Sptoolbox).

I exracted features for each frame (Spectral centroid, Spectral flux, Zero
crossings, and Low energy) and tried to train the  SVM binary classifier in
2-dimensional space, for example: Zero Crossing Rate vs Spectral centroid,
but the error percentage was too high (around 35%). 

I also extracted MFCC coefficients from vocal/non vocal regions as
suggested in the article that I have based on ("Hybrid Singer Identifier",
John Shepherd), but I don’t understand how can I train the 2-dimensional
binary SVM classifier using 14 dimensional features vector, and is it even
possible to perform the classification in 2D space.

Here are some of the articles I used for reference, but they didn't give
an answer or perhaps I didn't understand the point.
 
•	Mel frequency cepstral coefficients for music modeling

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.9216

•	LOCATING SINGING VOICE SEGMENTS WITHIN MUSIC SIGNALS

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.3067

Separation of vocals from polyphonic audio recordings

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.5580


I would like if someone could suggest me any other articles on this issue
or explain shortly how to train the binary SVM classifier with MFCC
coefficients in order to apply vocal/non vocal segmentation.


Thanks in advance,

Rodion 
 



Sounds like suitable question for  comp.speech.research


Rodion wrote:

> Hi all, > > I'm working on a singer recognition project that identifies singer in a > music recording via Multifeature Statistical Singer Modeling. > > At this stage, I trying to apply vocal/non vocal segmentation using SVM > classifier (Matlab - Sptoolbox). > > I exracted features for each frame (Spectral centroid, Spectral flux, Zero > crossings, and Low energy) and tried to train the SVM binary classifier in > 2-dimensional space, for example: Zero Crossing Rate vs Spectral centroid, > but the error percentage was too high (around 35%). > > I also extracted MFCC coefficients from vocal/non vocal regions as > suggested in the article that I have based on ("Hybrid Singer Identifier", > John Shepherd), but I don’t understand how can I train the 2-dimensional > binary SVM classifier using 14 dimensional features vector, and is it even > possible to perform the classification in 2D space. > > Here are some of the articles I used for reference, but they didn't give > an answer or perhaps I didn't understand the point. > > • Mel frequency cepstral coefficients for music modeling > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.9216 > > • LOCATING SINGING VOICE SEGMENTS WITHIN MUSIC SIGNALS > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.3067 > > Separation of vocals from polyphonic audio recordings > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.5580 > > > I would like if someone could suggest me any other articles on this issue > or explain shortly how to train the binary SVM classifier with MFCC > coefficients in order to apply vocal/non vocal segmentation. > > > Thanks in advance, > > Rodion > > > >