Hi, I am currently doing this project of Classification and categorisation on the basis of speech. I have to sort the samples into male or female and into age categories on the basis of the speech samples. For this i have used pitch as a fetaure and on this i am trying to achieve the classification. HAve a few problems. Would be gr8 if there is some help on the following pts. 1. I have done the pitch period estimation part using the SIFT algorithm. HAve a few problems regarding voiced and unvoiced decision. Wanted some help on it 2. I have done the processing frame wise. Do i need to multiply by some window llike HAnning 3. Also i have attempted to do Speech enhancement using Spectral subtraction. But i find that after the enhancement it is altering the pitch of the sample. 4. My prog has some problems differentiating between voiced and unvoiced frames. So it calculates the pitch inaccurately. What do i do 5. Also in framewise processing how do i reach to a single pitch period value All of this coding has been done in C. Thank you Cheers Swanand |
Pitch problems
Started by ●March 6, 2005
Reply by ●March 7, 20052005-03-07
Hai, For voiced or unvoiced decision, energy or zero crossing rate can be used. This is discussed in "Digital processing of speech signals" by Rabiner. More zero crossing refers unvoiced and less refers voiced. Pitch is not a constant parameter. So, u can find pitch for each window and can take the maximum value. Fix the threshold. If pitch falls in some range, u can decide whether it is male or female. (Its just an idea). For windowing, Hamming window is generally used. (Length of the window and amount of overlap has to be fixed). All the best Venkat On Mon, 07 Mar 2005 viper4295 wrote : >Hi, > I am currently doing this project of Classification and >categorisation on the basis of speech. I have to sort the samples >into male or female and into age categories on the basis of the >speech samples. For this i have used pitch as a fetaure and on this >i am trying to achieve the classification. HAve a few problems. >Would be gr8 if there is some help on the following pts. >1. I have done the pitch period estimation part using the SIFT >algorithm. HAve a few problems regarding voiced and unvoiced >decision. Wanted some help on it >2. I have done the processing frame wise. Do i need to multiply by >some window llike HAnning >3. Also i have attempted to do Speech enhancement using Spectral >subtraction. But i find that after the enhancement it is altering >the pitch of the sample. >4. My prog has some problems differentiating between voiced and >unvoiced frames. So it calculates the pitch inaccurately. What do i >do >5. Also in framewise processing how do i reach to a single pitch >period value > All of this coding has been done in C. Thank you >Cheers >Swanand >NEW! You can now post a message or access and search the archives of this group on DSPRelated.com: >http://www.dsprelated.com/groups/speechcoding/1.php >_____________________________________ >Note: If you do a simple "reply" with your email client, only the author of this message will receive your answer. You need to do a "reply all" if you want your answer to be distributed to the entire group. >_____________________________________ >About this discussion group: >Archives: >http://www.dsprelated.com/groups/speechcoding/1.php >To Post: Send an email to >Other DSP Related Groups: >http://www.dsprelated.com/groups.php > >Yahoo! Groups Links >To br> >. > Mistakes are not end of the world but repeating them is |
Reply by ●March 9, 20052005-03-09
1. - 2. It seems that for better estimation of pitch in the following formula for calculation of normalized correlation it is useful to window the signal x and y. Cp = sum_over_n(x(n)*y(p+n)) / sqrt( sum_over_n(x(n)^2)*sum_over_n(y(p+n)^2) ) However, the problem is that the additional computational complexity is very high since you have to window the signal y for every lag p. I guess that this is the reason why nobody is windowing signals for pitch estimation. 3. Spectrum subtraction may slightly help only for colored noise. The algorithm generally may corrupt vital features of the signal that may lead to different pitch. 4. - 5. There is a very strong pitch estimation algorithm that makes a special treatment to pitch doubling. See the speech coder MELP. Hope it will help a little, Ilya Druker --- In , "viper4295" <swanand_y_d@h...> wrote: > > > Hi, > I am currently doing this project of Classification and > categorisation on the basis of speech. I have to sort the samples > into male or female and into age categories on the basis of the > speech samples. For this i have used pitch as a fetaure and on this > i am trying to achieve the classification. HAve a few problems. > Would be gr8 if there is some help on the following pts. > 1. I have done the pitch period estimation part using the SIFT > algorithm. HAve a few problems regarding voiced and unvoiced > decision. Wanted some help on it > 2. I have done the processing frame wise. Do i need to multiply by > some window llike HAnning > 3. Also i have attempted to do Speech enhancement using Spectral > subtraction. But i find that after the enhancement it is altering > the pitch of the sample. > 4. My prog has some problems differentiating between voiced and > unvoiced frames. So it calculates the pitch inaccurately. What do i > do > 5. Also in framewise processing how do i reach to a single pitch > period value > All of this coding has been done in C. Thank you > Cheers > Swanand |