I'm total newbie with DSP stuff and need some advices. I'm implementing automatic clay release system for my shooting club with dsPIC30F. First i program proto SW for PC with C/C++. What it needs to do, is to detected any kind of yelling sound made by human, over the shotgun bangs, metallic loading 'clicks' and wind humming, as fast as possible. After the yell is detected, there can be several seconds of delay before next clay, so i don't have to be worried about detecting when the sound ends, or stuff like that. Loading sound and wind part is easy i think, LP filter will do it? But the shot sound is more difficult, because it fills the entire frequency range of human sounds pretty evenly. I've analyzed several yelling and shooting samples with audacity. In the pitch view (uses EAC, enhanced autocorrelation) there is a very clearly visible difference between all human sounds versus shots. In human sounds, there are clearly visible pitches, which produces nice curves to screen. While in shots, there are very thin lines which are absolute straight. These curves are clearly visible even in the sample where yell is mixed with several bang sounds, and there doesn't seem to be any visible differences in normal spectrogram view. I am mostly a graphics programmer, and noticed that hey, if i can see the difference that easily on the screen, i can easily detect it from the data beyond the image as well. Using the simplified filling algo which scans the vertical lines (frames when speaking in 'DSP'?) real-time when they arrive, searching the pitches, and comparing them to results of previous line(s), and that way seeing if there are curves in the image. Does this make any sense? Is there easier/smarter way to do this? Then i started to look for pitch detection algorithms, which i could use to produce the 'image' similar to audacity's EAC. I figured that Cepstrum could be a good choice? AFAIK it's simply a FFT to the result of FFT? I could easily do FFT twice in real-time with dsPIC DSP library, is it really that simple? I couldn't find any pitch images produced by Cepstrum, are they similar to ones made by EAC? Do you think this is good way of detecting human yells over bangs, or is there better way?
Detect human yell in very noise environment (shooting yard)
Started by ●December 14, 2007
Reply by ●December 14, 20072007-12-14
mdmx wrote:> Loading sound and wind part is easy i think, LP filter will do it? > > But the shot sound is more difficult, because it fills the entire > frequency range of human sounds pretty evenly. >Wind is also wide-band. If I were you, I would try detecting something other than a human yell - perhaps a blowing whistle instead. -- Jim Thomas Principal Applications Engineer Bittware, Inc jthomas@bittware.com http://www.bittware.com (603) 226-0404 x536 Any sufficiently advanced technology is indistinguishable from magic. - Arthur C. Clarke
Reply by ●December 14, 20072007-12-14
mdmx schrieb:> I'm total newbie with DSP stuff and need some advices. > > I'm implementing automatic clay release system for my shooting club with > dsPIC30F. First i program proto SW for PC with C/C++. > > What it needs to do, is to detected any kind of yelling sound made by > human, over the shotgun bangs, metallic loading 'clicks' and wind humming, > as fast as possible. After the yell is detected, there can be several > seconds of delay before next clay, so i don't have to be worried about > detecting when the sound ends, or stuff like that.I would recommend to do all in one step. Human vocals are characterized by a typical spectral distribution. The absolute frequencies of this distribution varies from person to person and from time to time. But the relative footprint is always similar. As far as I know, vocals have two maxima. There relative intensities define the kind of the vocale (A, O, U etc.). Furthermore this sound will not change too fast over time. I would look at algorithms for speech detection and extract the part for the vocals. Marcel
Reply by ●December 14, 20072007-12-14
mdmx wrote:> I'm total newbie with DSP stuff and need some advices.You are underestimating yourself. I like your systematic approach.> I'm implementing automatic clay release system for my shooting club with > dsPIC30F. First i program proto SW for PC with C/C++.Good idea.> What it needs to do, is to detected any kind of yelling sound made by > human, over the shotgun bangs, metallic loading 'clicks' and wind humming, > as fast as possible. After the yell is detected, there can be several > seconds of delay before next clay, so i don't have to be worried about > detecting when the sound ends, or stuff like that. > > Loading sound and wind part is easy i think, LP filter will do it?For your purpose, you can bandpass from 300Hz to 1.5kHz. However a lot of noise falls into this band, too.> But the shot sound is more difficult, because it fills the entire > frequency range of human sounds pretty evenly. > > I've analyzed several yelling and shooting samples with audacity. In the > pitch view (uses EAC, enhanced autocorrelation) there is a very clearly > visible difference between all human sounds versus shots. In human sounds, > there are clearly visible pitches, which produces nice curves to screen. > While in shots, there are very thin lines which are absolute straight. > > These curves are clearly visible even in the sample where yell is mixed > with several bang sounds, and there doesn't seem to be any visible > differences in normal spectrogram view. > > I am mostly a graphics programmer, and noticed that hey, if i can see the > difference that easily on the screen, i can easily detect it from the data > beyond the image as well. Using the simplified filling algo which scans the > vertical lines (frames when speaking in 'DSP'?) real-time when they arrive, > searching the pitches, and comparing them to results of previous line(s), > and that way seeing if there are curves in the image. > > Does this make any sense? Is there easier/smarter way to do this?It makes perfect sense. However this approach can also trigger on the sounds of birds, cat meowing, music, etc. Distinguishing the sounds is the difficult task.> Then i started to look for pitch detection algorithms, which i could use > to produce the 'image' similar to audacity's EAC. I figured that Cepstrum > could be a good choice? AFAIK it's simply a FFT to the result of FFT? I > could easily do FFT twice in real-time with dsPIC DSP library, is it > really that simple?Cepstrum may work too, however if EAC is working, why not simply using it?> I couldn't find any pitch images produced by Cepstrum, are they similar to > ones made by EAC?It is all the same. You are just looking at the same signal from the different sides.> Do you think this is good way of detecting human yells over bangs, or is > there better way?The tough part is distinguishing the human yells from the other similar sounds. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Reply by ●December 14, 20072007-12-14
On Fri, 14 Dec 2007 07:27:42 -0600, "mdmx" <juha.mattila@netti.fi> wrote:>I'm total newbie with DSP stuff and need some advices. > >I'm implementing automatic clay release system for my shooting club with >dsPIC30F. First i program proto SW for PC with C/C++. > >What it needs to do, is to detected any kind of yelling sound made by >human, over the shotgun bangs, metallic loading 'clicks' and wind humming, >as fast as possible. After the yell is detected, there can be several >seconds of delay before next clay, so i don't have to be worried about >detecting when the sound ends, or stuff like that. > >Loading sound and wind part is easy i think, LP filter will do it? > >But the shot sound is more difficult, because it fills the entire >frequency range of human sounds pretty evenly. > >I've analyzed several yelling and shooting samples with audacity. In the >pitch view (uses EAC, enhanced autocorrelation) there is a very clearly >visible difference between all human sounds versus shots. In human sounds, >there are clearly visible pitches, which produces nice curves to screen. >While in shots, there are very thin lines which are absolute straight. > >These curves are clearly visible even in the sample where yell is mixed >with several bang sounds, and there doesn't seem to be any visible >differences in normal spectrogram view. > >I am mostly a graphics programmer, and noticed that hey, if i can see the >difference that easily on the screen, i can easily detect it from the data >beyond the image as well. Using the simplified filling algo which scans the >vertical lines (frames when speaking in 'DSP'?) real-time when they arrive, >searching the pitches, and comparing them to results of previous line(s), >and that way seeing if there are curves in the image. > >Does this make any sense? Is there easier/smarter way to do this?I think I'd just use a foot switch.>Then i started to look for pitch detection algorithms, which i could use >to produce the 'image' similar to audacity's EAC. I figured that Cepstrum >could be a good choice? AFAIK it's simply a FFT to the result of FFT? I >could easily do FFT twice in real-time with dsPIC DSP library, is it >really that simple? > >I couldn't find any pitch images produced by Cepstrum, are they similar to >ones made by EAC? > >Do you think this is good way of detecting human yells over bangs, or is >there better way? > >Eric Jacobsen Minister of Algorithms Abineau Communications http://www.ericjacobsen.org
Reply by ●December 14, 20072007-12-14
Eric Jacobsen wrote: ...> I think I'd just use a foot switch.Spoilsport! A /Pull!/ detector is *so* much more elegant. ... Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●December 14, 20072007-12-14
Jerry Avins wrote:> Eric Jacobsen wrote: > > ... > >> I think I'd just use a foot switch. > > > Spoilsport! A /Pull!/ detector is *so* much more elegant. > > ... > > JerrySo there would be only ONE shooter on the range at a time ??? If there are more then, a headset mic would be the only way to detect a "PULL!" for each clay release. donald
Reply by ●December 14, 20072007-12-14
>Jerry Avins wrote: >> Eric Jacobsen wrote: >> >> ... >> >>> I think I'd just use a foot switch. >> >> >> Spoilsport! A /Pull!/ detector is *so* much more elegant. >> >> ... >> >> Jerry > >So there would be only ONE shooter on the range at a time ??? > >If there are more then, a headset mic would be the only way to detect a >"PULL!" for each clay release. > > >donald >You have to leverage knowledge where it is available. As the poster has more familiarity in graphics, why not just use a digital camera to identify which shooter is speaking through facial detection and then checking for open mouths? Actually, I'm not thinking elegantly enough. Do away with the whole audio processing bit, and just use lip reading to detect the "pull" word. In all seriousness, whenever I've been on a range (once or twice at scout camp), it was very bad form to call for the skeet if there were people on the range that didn't know it was explicitly meant for you. In that situation, it wouldn't matter who yelled. I'm not sure how it would work on a larger recreational range.
Reply by ●December 14, 20072007-12-14
On Fri, 14 Dec 2007 11:09:42 -0700, donald <Donald@dontdoithere.com> wrote:>Jerry Avins wrote: >> Eric Jacobsen wrote: >> >> ... >> >>> I think I'd just use a foot switch. >> >> >> Spoilsport! A /Pull!/ detector is *so* much more elegant. >> >> ... >> >> Jerry > >So there would be only ONE shooter on the range at a time ???How's that different than the proposed system? What prevents multiple people from saying "Pull!" at the same time? Multiple foot switches just ORed together would work, too, from my understanding of the possible architectures. Switches are a lot cheaper than what's proposed.>If there are more then, a headset mic would be the only way to detect a >"PULL!" for each clay release.I don't see the advantage other than as an academic exercise. What's the difference between anyone on the range saying "Pull!" or stepping on the switch? The switch won't have the detection/false alarm problems, or reliability or cost issues that an audio processor would have, I'd bet. If the idea is really an academic exercise, then it's an interesting one, since the audio will have to pulled (heh) out of a noisy environment. For a practical system, it's not so great. Eric Jacobsen Minister of Algorithms Abineau Communications http://www.ericjacobsen.org
Reply by ●December 14, 20072007-12-14
Eric Jacobsen wrote:> On Fri, 14 Dec 2007 11:09:42 -0700, donald <Donald@dontdoithere.com> > wrote: > >> Jerry Avins wrote: >>> Eric Jacobsen wrote: >>> >>> ... >>> >>>> I think I'd just use a foot switch. >>> >>> Spoilsport! A /Pull!/ detector is *so* much more elegant. >>> >>> ... >>> >>> Jerry >> So there would be only ONE shooter on the range at a time ??? > > How's that different than the proposed system? What prevents > multiple people from saying "Pull!" at the same time? Multiple foot > switches just ORed together would work, too, from my understanding of > the possible architectures. > > Switches are a lot cheaper than what's proposed.No expense should be spared here.>> If there are more then, a headset mic would be the only way to detect a >> "PULL!" for each clay release. > > I don't see the advantage other than as an academic exercise. What's > the difference between anyone on the range saying "Pull!" or stepping > on the switch?Tradition.> The switch won't have the detection/false alarm problems, or > reliability or cost issues that an audio processor would have, I'd > bet.A minor detail compared to scrapping tradition. Hollering PULL! and seeing the skeet fly gives the shooter such a powerful sense of command!> If the idea is really an academic exercise, then it's an interesting > one, since the audio will have to pulled (heh) out of a noisy > environment. For a practical system, it's not so great.Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������






