Forums

Newbie Q: Goertzel threshold testing, speech detection

Started by Zack Angelo April 10, 2004
Hi, 

I'm currently using the Goertzel algorithm to do DTMF detection. 
Currently, I get the magnitude for each DTMF frequency, save the sums
of the magnitudes that correspond to each number, then I assume
whichever sum is the highest must the be the most likely DTMF key. 
This works, but seems kludgy.  I do it this way because I'm having a
hard time determining an appropriate threshold value to use to
eliminate frequencies that aren't present (and I'm not even having to
add any robustness against noise yet!).  And although it probably
won't be a problem for my specific implementation, it also seems that
the threshold scale changes pretty dramatically when you move from one
block size to another.  How do you guys traditionally do threshold
testing for DFT coefficients?

I'm also interested in doing speech detection with Goertzel. I've read
that in addition to getting the 8 coefficients for DTMF detection, you
can also detect the presence of speech with an additional 8. Does
anyone know what the additional 8 frequencies are, and are there any
specific ways I'd have to modify my decision logic to do this (i.e.,
do all eight frequencies have to be present, and what frequencies
would I have to compare against if I were doing relative threshold
testing)?

Thanks in advance,
Zack Angelo
Zack Angelo wrote:
> Hi, > > I'm currently using the Goertzel algorithm to do DTMF detection. > Currently, I get the magnitude for each DTMF frequency, save the sums > of the magnitudes that correspond to each number, then I assume > whichever sum is the highest must the be the most likely DTMF key. > This works, but seems kludgy. I do it this way because I'm having a > hard time determining an appropriate threshold value to use to > eliminate frequencies that aren't present (and I'm not even having to > add any robustness against noise yet!). And although it probably > won't be a problem for my specific implementation, it also seems that > the threshold scale changes pretty dramatically when you move from one > block size to another. How do you guys traditionally do threshold > testing for DFT coefficients?
The way I've seen it done (and done it myself) is by making sure that only one row frequency and one column frequency are "present" - "present" being defined as "above a threshold." The ITU specs (Q.21 & Q.23? there are two DTMF specs) will tell you what that threshold is (probably in dBm's), and you'll have to figure out what that translates to in your codec (G.711 might help in that regard too, if you're using A-law or mu-law). The DTMF specs also define a value for "twist" - I can't remember the value in the spec, but twist means that the two tones that make up the DTMF pair must be close in amplitude to one another.
> > I'm also interested in doing speech detection with Goertzel. I've read > that in addition to getting the 8 coefficients for DTMF detection, you > can also detect the presence of speech with an additional 8. Does > anyone know what the additional 8 frequencies are, and are there any > specific ways I'd have to modify my decision logic to do this (i.e., > do all eight frequencies have to be present, and what frequencies > would I have to compare against if I were doing relative threshold > testing)?
I think the speech detection you're talking about here is really for false-DTMF tone rejection. The app notes from ADI and TI will have you check not only for the presense of two DTMF tones (one row, one column), but also the absense of their second harmonics. If the DTMF frequency is present and if its second harmonic is also present, it should not be considered DTMF because it is more likely speech. This isn't so much a measure of the likelyhood of speech as it is a measure of the unlikelyhood of DTMF. -- Jim Thomas Principal Applications Engineer Bittware, Inc jthomas@bittware.com http://www.bittware.com (703) 779-7770 Nothing is ever so bad that it can't get worse. - Calvin
Jim Thomas <jthomas@bittware.com> wrote in message news:<107l9h4eupiboaf@corp.supernews.com>...
> Zack Angelo wrote: > > Hi, > > > > I'm currently using the Goertzel algorithm to do DTMF detection. > > Currently, I get the magnitude for each DTMF frequency, save the sums > > of the magnitudes that correspond to each number, then I assume > > whichever sum is the highest must the be the most likely DTMF key. > > This works, but seems kludgy. I do it this way because I'm having a > > hard time determining an appropriate threshold value to use to > > eliminate frequencies that aren't present (and I'm not even having to > > add any robustness against noise yet!). And although it probably > > won't be a problem for my specific implementation, it also seems that > > the threshold scale changes pretty dramatically when you move from one > > block size to another. How do you guys traditionally do threshold > > testing for DFT coefficients? > > The way I've seen it done (and done it myself) is by making sure that > only one row frequency and one column frequency are "present" - > "present" being defined as "above a threshold." The ITU specs (Q.21 & > Q.23? there are two DTMF specs) will tell you what that threshold is > (probably in dBm's), and you'll have to figure out what that translates > to in your codec (G.711 might help in that regard too, if you're using > A-law or mu-law). > > The DTMF specs also define a value for "twist" - I can't remember the > value in the spec, but twist means that the two tones that make up the > DTMF pair must be close in amplitude to one another.
These specs actually vary a bit from country to country. Some countries believe their phone networks are so bad they need greater twist tolerance :-)
> > > > I'm also interested in doing speech detection with Goertzel. I've read > > that in addition to getting the 8 coefficients for DTMF detection, you > > can also detect the presence of speech with an additional 8. Does > > anyone know what the additional 8 frequencies are, and are there any > > specific ways I'd have to modify my decision logic to do this (i.e., > > do all eight frequencies have to be present, and what frequencies > > would I have to compare against if I were doing relative threshold > > testing)? > > I think the speech detection you're talking about here is really for > false-DTMF tone rejection. The app notes from ADI and TI will have you > check not only for the presense of two DTMF tones (one row, one column), > but also the absense of their second harmonics. If the DTMF frequency > is present and if its second harmonic is also present, it should not be > considered DTMF because it is more likely speech. This isn't so much a > measure of the likelyhood of speech as it is a measure of the > unlikelyhood of DTMF.
The 2nd harmonic test is a good way to do things if you are making a detector for PSTN use, where it must tolerate some of your dial tone spilled back from the far end. If its more like an IVR app., with an echo canceller, a better test is to check the energy in the row and column hits is a large percentage of the total signal energy. That is extremely speech immune. Regards, Steve