>Yes. But make sure you compute your signal power the same way Matlab >is computing the wgn power: Psignal = 10*log10(sum(x.*x)/length(x)); >-- >% Randy Yates % "How's life on earth? >%% Fuquay-Varina, NC % ... What is it worth?" >%%% 919-577-9882 % 'Mission (A World Record)', >%%%% <yates@ieee.org> % *A New World Record*, ELO >http://home.earthlink.net/~yatescr >Thank you, Randy. There's 1 more thing that baffles me. I've search through the internet looking for range of human speech.One of these sites is http://www.ergonomics4schools.com/lzone/noise.htm Most sites say our normal speech range is between 500~2000 Hz. If so,why are some of our wave files sampled at 8kz/16kz where more than 4kz is enough to meet the Nyquist rate? Moreover,i intend to use a bandpass filter on my noisy speech before i proceed with speech detection,cutoff frquency at 400 and 2100 Hz to reduce some noise.Do you think its feasible? Thanks.
Clean speech wav files
Started by ●April 3, 2006
Reply by ●April 4, 20062006-04-04
Reply by ●April 4, 20062006-04-04
http://sound.media.mit.edu/mpeg4/audio/sqam/ has a few samples that are 20 seconds or so long. Very clean and free to download. -- Jon Harris SPAM blocker in place: Remove 99 (but leave 7) to reply "doggie" <elusivetruelove2003@yahoo.com> wrote in message news:qpudnUW_5usQn6zZRVn-rQ@giganews.com...> > Hi everyone,do any of you know where i can get clean speech wav files that > are more than 10 seconds? I need them for testing my speech detection > algorithms. Thanks > >
Reply by ●April 4, 20062006-04-04
doggie wrote:>> Yes. But make sure you compute your signal power the same way Matlab >> is computing the wgn power: Psignal = 10*log10(sum(x.*x)/length(x)); >> -- >> % Randy Yates % "How's life on earth? >> %% Fuquay-Varina, NC % ... What is it worth?" >> %%% 919-577-9882 % 'Mission (A World Record)', >> %%%% <yates@ieee.org> % *A New World Record*, ELO >> http://home.earthlink.net/~yatescr >> > Thank you, Randy. There's 1 more thing that baffles me. I've search > through the internet looking for range of human speech.One of these sites > is > http://www.ergonomics4schools.com/lzone/noise.htm > Most sites say our normal speech range is between 500~2000 Hz. If so,why > are some of our wave files sampled at 8kz/16kz where more than 4kz is > enough to meet the Nyquist rate? Moreover,i intend to use a bandpass > filter on my noisy speech before i proceed with speech detection,cutoff > frquency at 400 and 2100 Hz to reduce some noise.Do you think its > feasible? Thanks. >It is incorrect to say the speech range is 500-2000Hz. The *bulk* of the energy is in that range, but enough lies outside to make speech clipped at 2kHz sound rather poor. Telephones try to achieve at least 3kHz bandwidth, and the usual specified maximum for telephone lines is 3400Hz. Hence 8kHz sampling gives you that and a little headroom. However, speech limited to 4kHz bandwidth still doesn't sound right. Try saying s, f, s, f, s, f over an ordinary PSTN phone and see it the recipient can tell which is which. Most of the energy differentiating those unvoiced sounds is in the 4kHz to 7kHz range. So 16kHz sampled voice gains significant clarity over normal telephone quality. This isn't the end, though. Although 16kHz sampled voice has excellent clarity for pretty much the entire range of human speech, extending the upper bandwidth limit to 15kHz or more brings significant further improvement to the pleasant of the sound. This reduces listener stress, so long conversations are less tiring. So, the bandwidth you need depends rather a lot on what you are trying to achieve. Steve
Reply by ●April 4, 20062006-04-04
Vowel formants in human speech are mostly between 200 Hz and 5500 Hz, but often it possible to distinguish between vowels only by identifying only the first two formants, because they are different for every vowel. The first two formants usually fall between 200 Hz and 3200 Hz. However, Steve Underwood is perfectly right that fricatives have most energy in the 4-7 kHz range. Peter
Reply by ●April 4, 20062006-04-04
>doggie wrote: >>> Yes. But make sure you compute your signal power the same way Matlab >>> is computing the wgn power: Psignal = 10*log10(sum(x.*x)/length(x)); >>> -- >>> % Randy Yates % "How's life on earth? >>> %% Fuquay-Varina, NC % ... What is it worth?" >>> %%% 919-577-9882 % 'Mission (A World Record)', >>> %%%% <yates@ieee.org> % *A New World Record*, ELO >>> http://home.earthlink.net/~yatescr >>> >> Thank you, Randy. There's 1 more thing that baffles me. I've search >> through the internet looking for range of human speech.One of thesesites>> is >> http://www.ergonomics4schools.com/lzone/noise.htm >> Most sites say our normal speech range is between 500~2000 Hz. Ifso,why>> are some of our wave files sampled at 8kz/16kz where more than 4kz is >> enough to meet the Nyquist rate? Moreover,i intend to use a bandpass >> filter on my noisy speech before i proceed with speechdetection,cutoff>> frquency at 400 and 2100 Hz to reduce some noise.Do you think its >> feasible? Thanks. >> > >It is incorrect to say the speech range is 500-2000Hz. The *bulk* of the>energy is in that range, but enough lies outside to make speech clipped >at 2kHz sound rather poor. Telephones try to achieve at least 3kHz >bandwidth, and the usual specified maximum for telephone lines is >3400Hz. Hence 8kHz sampling gives you that and a little headroom. >However, speech limited to 4kHz bandwidth still doesn't sound right. Try>saying s, f, s, f, s, f over an ordinary PSTN phone and see it the >recipient can tell which is which. Most of the energy differentiating >those unvoiced sounds is in the 4kHz to 7kHz range. So 16kHz sampled >voice gains significant clarity over normal telephone quality. This >isn't the end, though. Although 16kHz sampled voice has excellent >clarity for pretty much the entire range of human speech, extending the >upper bandwidth limit to 15kHz or more brings significant further >improvement to the pleasant of the sound. This reduces listener stress, >so long conversations are less tiring. > >So, the bandwidth you need depends rather a lot on what you are trying >to achieve. > >Steve >Thank you,Steve. So its a tradeoff between clarity and memory space needed. And it seems i can only filter out 0~20Hz. You see, what i'm trying to do is... cleanspeech=wavread("ajfhjkafh"); noise=wgn(..........); noisyspeech=cleanspeech+noise; i saw a few algorithm that does a low or high pass filtering,they call it preprocessing.thus before i implement my speech detection algorithm,i hope to do the same thing to reduce some noise and improve the effectiveness of my algorithm. Thanks
Reply by ●April 4, 20062006-04-04
"doggie" <elusivetruelove2003@yahoo.com> writes:>>Yes. But make sure you compute your signal power the same way Matlab >>is computing the wgn power: Psignal = 10*log10(sum(x.*x)/length(x)); >>-- >>% Randy Yates % "How's life on earth? >>%% Fuquay-Varina, NC % ... What is it worth?" >>%%% 919-577-9882 % 'Mission (A World Record)', >>%%%% <yates@ieee.org> % *A New World Record*, ELO >>http://home.earthlink.net/~yatescr >> > Thank you, Randy. There's 1 more thing that baffles me. I've search > through the internet looking for range of human speech.One of these sites > is > http://www.ergonomics4schools.com/lzone/noise.htm > Most sites say our normal speech range is between 500~2000 Hz. If so,why > are some of our wave files sampled at 8kz/16kz where more than 4kz is > enough to meet the Nyquist rate? Moreover,i intend to use a bandpass > filter on my noisy speech before i proceed with speech detection,cutoff > frquency at 400 and 2100 Hz to reduce some noise.Do you think its > feasible? Thanks.2100 Hz is a bit low and 500 Hz is a bit high. However this is really a judgement call. Phone systems (namely, landlines, and the GSM, AMPS, and DAMPS cell systems) typically use a bandwidth of 300 Hz to 3400 Hz. This is the minimum. In terms of human perception, more bandwidth is better, up to (I would say) 50 to 20000 Hz. I don't know how your detection algorithm works. It could be the case that it can use the extra bandwidth to its advantage. If you could somehow establish that your detection algorithm doesn't use any information below f1 Hz nor any information above f2 Hz, then it would indeed be a good idea to bandpass filter the signal at f1 to f2 Hz before sending to the detector. -- % Randy Yates % "Rollin' and riding and slippin' and %% Fuquay-Varina, NC % sliding, it's magic." %%% 919-577-9882 % %%%% <yates@ieee.org> % 'Living' Thing', *A New World Record*, ELO http://home.earthlink.net/~yatescr
Reply by ●April 4, 20062006-04-04
"doggie" <elusivetruelove2003@yahoo.com> writes:> [...] > http://www.ergonomics4schools.com/lzone/noise.htmThis site states: The human voice produces frequencies between 500Hz and 2,000Hz. I would say this is just plain wrong. As I stated before, the minimum that is accepted by industry is 300 Hz to 3400 Hz. This is one of the downsides of the internet - there is bad information out there as well as good. -- % Randy Yates % "...the answer lies within your soul %% Fuquay-Varina, NC % 'cause no one knows which side %%% 919-577-9882 % the coin will fall." %%%% <yates@ieee.org> % 'Big Wheels', *Out of the Blue*, ELO http://home.earthlink.net/~yatescr
Reply by ●April 4, 20062006-04-04
Randy Yates <yates@ieee.org> writes:> "doggie" <elusivetruelove2003@yahoo.com> writes: >> [...] >> http://www.ergonomics4schools.com/lzone/noise.htm > > This site states: > > The human voice produces frequencies between 500Hz and 2,000Hz. > > I would say this is just plain wrong.I should clarify that their statement is wrong assuming, by inference, that it *excludes* the frequencies outside the range 500 to 2000 Hz. It's not absolutely clear what sense they mean it in. At a minimum, the statment is unclear. This statement is analogous to saying, "U.S. history covers the years 1970 to 1980." Well, it DOES cover those years, but it covers a lot of others as well. -- % Randy Yates % "How's life on earth? %% Fuquay-Varina, NC % ... What is it worth?" %%% 919-577-9882 % 'Mission (A World Record)', %%%% <yates@ieee.org> % *A New World Record*, ELO http://home.earthlink.net/~yatescr
Reply by ●April 4, 20062006-04-04
>http://sound.media.mit.edu/mpeg4/audio/sqam/ has a few samples that are 20>seconds or so long. Very clean and free to download. > >-- >Jon Harris >SPAM blocker in place: >Remove 99 (but leave 7) to reply > >"doggie" <elusivetruelove2003@yahoo.com> wrote in message >news:qpudnUW_5usQn6zZRVn-rQ@giganews.com... >> >> Hi everyone,do any of you know where i can get clean speech wav filesthat>> are more than 10 seconds? I need them for testing my speech detection >> algorithms. Thanks >> >> > > >Great site!! Thanks a lot.This would really be very helpful for me. Thank you everyone. Hmm..most of the wave files i've got are sampled at 8kHz, which i guess all the information above 4kHz has been filtered away,formants and stuff like that..Am i right? As i am using mostly energy based detection algorithms, i don't think i can filter out any portion that may contain speech, else,their energy may drop and affect my algorithm..I guess maybe i will try a high pass filter to filter out some noise at the 0~50Hz end.. My clean speech is -28db as calculated by the formula given by Randy previously.so to get a SNR of 10db,i add noise using wgn(length(cleanspeech),1,-38). Did i do anything wrong? So far,the result for 10 and 5db SNR is still acceptable except for some part where noise is being detected as speech which i intend to try to get rid by adding zero crossing detection or stuff like that.. Thanks everyone. Experimenting with these stuff are quite fun and addictive.I guess im spending more time than i should.. *laugh* :)
Reply by ●April 4, 20062006-04-04
doggie wrote: ...> And it seems i can only filter out 0~20Hz.How does that follow? There's very little energy below 80 Hz even in a basso profundo's lowest notes. Removing 50 or 60 Hz (depending on where one lives) is often salutary. In general, removing those bands with the lowest SNR is helpful. Listening to noise is tiring. Think of the fan in your kitchen. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������






