Reply by bharat pathak October 27, 20102010-10-27
I have made a word rcognition system which works on taking
the speech samples and passing it via filter banks and then
doing the decimation. At the end of it, a 2 dimensional
template is generated which is time averaged (time averaging)
is done only during training phase.

When the actual speech sample comes in, its template is
computed and compared with averaged-reference-template,
the one which has min error is the closest match. 

Matlab and C code available on sale. Check for arithos designs
to get in touch.

Regards
Bharat
Reply by HardySpicer October 27, 20102010-10-27
On Oct 28, 9:47&#2013266080;am, Tim Wescott <t...@seemywebsite.com> wrote:
> On 10/27/2010 11:26 AM, glen herrmannsfeldt wrote: > > > > > Tim Wescott<t...@seemywebsite.com> &#2013266080;wrote: > > (snip) > > >> Picking out two different recordings, made in two different places, of > >> the same sound source may be possible, but you'd have all sorts of > >> complications because of different acoustics causing different echos and > >> reverberations. > > >> But two _different_ people saying the same word? &#2013266080;Oh man -- that's a > >> task that humans don't always get right; getting a machine that could do > >> it reliably would require a team of people for a good amount of time. > >> I'm not even sure it's been done, but if it is it's being done by a > >> well-connected researcher who's good a writing grant proposals and at > >> getting work out of his grad students. > > > It seems that it is good enough for companies to (try to) use it. > > > More and more phone response systems, such as banks and airlines, > > are using it. &#2013266080;I usually find it easier to put in the account > > number or flight number using the keypad, but they expect one > > to "say" the account or flight number. &#2013266080;Sometimes it gets it right, > > other times not. > > > I remember one about 30 years agot that would do one digit math > > problems, and ask for the answer. &#2013266080;Even with only ten choices, > > it got it wrong fairly often. > > Independent speaker recognition of just a few words in one language is > heaps more reliable than independent speaker recognition of any random > utterance in any arbitrary language. > > -- >
for a fixed vocab it could be made to work. Ex: president,bomb,explode,meeting,kill and such like
Reply by HardySpicer October 27, 20102010-10-27
On Oct 28, 1:47&#2013266080;am, nagarajan karunakaran <prassa...@gmail.com> wrote:
> Hi, i am working on a concept where i want to compare two audio file > (content of audio file)is identical to some extent.For ex if it's two > different people saying the same words .I have to say they are equal. > I have done goggling but i not able to predict what is really needed > for my project. &#2013266080;I don't know ,where to start and how to start.How can > i compare audio files in that manner.Please help me out with any idea > about it.What is the correct approach for my concept.
Tricky. You need speech recognition and compare the text 9as the vampyre has already pointed out). Speech recognition needs training to be good so such a system can never be accurate unless it is people who have used the equipment apriori. Hardy to do with total strangers. You can rest Osama.. Hardy
Reply by HardySpicer October 27, 20102010-10-27
On Oct 28, 1:47&#2013266080;am, nagarajan karunakaran <prassa...@gmail.com> wrote:
> Hi, i am working on a concept where i want to compare two audio file > (content of audio file)is identical to some extent.For ex if it's two > different people saying the same words .I have to say they are equal. > I have done goggling but i not able to predict what is really needed > for my project. &#2013266080;I don't know ,where to start and how to start.How can > i compare audio files in that manner.Please help me out with any idea > about it.What is the correct approach for my concept.
Like "kill the president" I assume or "lets bomb americans"! Yes, could have lots of applications. hardy
Reply by Tim Wescott October 27, 20102010-10-27
On 10/27/2010 11:26 AM, glen herrmannsfeldt wrote:
> Tim Wescott<tim@seemywebsite.com> wrote: > (snip) > >> Picking out two different recordings, made in two different places, of >> the same sound source may be possible, but you'd have all sorts of >> complications because of different acoustics causing different echos and >> reverberations. > >> But two _different_ people saying the same word? Oh man -- that's a >> task that humans don't always get right; getting a machine that could do >> it reliably would require a team of people for a good amount of time. >> I'm not even sure it's been done, but if it is it's being done by a >> well-connected researcher who's good a writing grant proposals and at >> getting work out of his grad students. > > It seems that it is good enough for companies to (try to) use it. > > More and more phone response systems, such as banks and airlines, > are using it. I usually find it easier to put in the account > number or flight number using the keypad, but they expect one > to "say" the account or flight number. Sometimes it gets it right, > other times not. > > I remember one about 30 years agot that would do one digit math > problems, and ask for the answer. Even with only ten choices, > it got it wrong fairly often.
Independent speaker recognition of just a few words in one language is heaps more reliable than independent speaker recognition of any random utterance in any arbitrary language. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com Do you need to implement control loops in software? "Applied Control Theory for Embedded Systems" was written for you. See details at http://www.wescottdesign.com/actfes/actfes.html
Reply by Fred Marshall October 27, 20102010-10-27
On 10/27/2010 5:47 AM, nagarajan karunakaran wrote:
> Hi, i am working on a concept where i want to compare two audio file > (content of audio file)is identical to some extent.For ex if it's two > different people saying the same words .I have to say they are equal. > I have done goggling but i not able to predict what is really needed > for my project. I don't know ,where to start and how to start.How can > i compare audio files in that manner.Please help me out with any idea > about it.What is the correct approach for my concept.
Even programs that do this like Dragon Speaking have to be "trained". But here you didn't say anything like "the same two people" nor did you mention a training step in the process. So, I think this may be too ambitious for words! :-) Or, maybe talk to the NSA. Fred
Reply by Bryan October 27, 20102010-10-27
On Oct 27, 5:47&#2013266080;am, nagarajan karunakaran <prassa...@gmail.com> wrote:
> Hi, i am working on a concept where i want to compare two audio file > (content of audio file)is identical to some extent.For ex if it's two > different people saying the same words .I have to say they are equal. > I have done goggling but i not able to predict what is really needed > for my project. &#2013266080;I don't know ,where to start and how to start.How can > i compare audio files in that manner.Please help me out with any idea > about it.What is the correct approach for my concept.
As Vlad pointed out there appears to be a large knowledge gap you need to fill if you plan on going this alone. Typically, speech recognition (which is essentially what you're proposing) is approached from a variety of ways. I propose after the topics Vlad suggested you look into the following topics: Cross-correlation Dynamic time warping Linear predictive coding Formants Markov model Laplacian distribution If you've never seen any of those, you may want to reconsider the scope of your project.
Reply by glen herrmannsfeldt October 27, 20102010-10-27
Tim Wescott <tim@seemywebsite.com> wrote:
(snip)

> Picking out two different recordings, made in two different places, of > the same sound source may be possible, but you'd have all sorts of > complications because of different acoustics causing different echos and > reverberations.
> But two _different_ people saying the same word? Oh man -- that's a > task that humans don't always get right; getting a machine that could do > it reliably would require a team of people for a good amount of time. > I'm not even sure it's been done, but if it is it's being done by a > well-connected researcher who's good a writing grant proposals and at > getting work out of his grad students.
It seems that it is good enough for companies to (try to) use it. More and more phone response systems, such as banks and airlines, are using it. I usually find it easier to put in the account number or flight number using the keypad, but they expect one to "say" the account or flight number. Sometimes it gets it right, other times not. I remember one about 30 years agot that would do one digit math problems, and ask for the answer. Even with only ten choices, it got it wrong fairly often. -- glen
Reply by tjc October 27, 20102010-10-27
At a broad level, you're looking at a classification problem here.  Whether
you're dealing with recorded speech, music, bird noises, or whatever else,
you need to define some set of signals which you're looking for, such as
words / phrases, songs, instruments, etc.  Once you have established your
"library" of possible signals, then it comes down to being able to take a
particular audio file and determine which of the library entries it is most
similar too.  If two audio files match the same library entry, you can say
that the two signals are a match.

The hard part (as previous posters have pointed out) is the classification
step.  Speech to text is a non-trivial problem, as is recognizing music,
musical instruments, etc.  If you can narrow your library down to a small
class of signals (say, single spoken words) and develop a good classifier,
then you're well on your way.  It's going to be tough otherwise.

--Tom



>Hi, i am working on a concept where i want to compare two audio file >(content of audio file)is identical to some extent.For ex if it's two >different people saying the same words .I have to say they are equal. >I have done goggling but i not able to predict what is really needed >for my project. I don't know ,where to start and how to start.How can >i compare audio files in that manner.Please help me out with any idea >about it.What is the correct approach for my concept. >
Reply by Vladimir Vassilevsky October 27, 20102010-10-27

nagarajan karunakaran wrote:
> Hi, i am working on a concept where i want to compare two audio file > (content of audio file)is identical to some extent.
Compute waterfall spectrograms, measure the normalized distance between them. Your professor will be more then happy.
> For ex if it's two > different people saying the same words .I have to say they are equal.
Convert speech to text, compare the texts.
> I have done goggling but i not able to predict what is really needed > for my project. I don't know ,where to start and how to start. > How can i compare audio files in that manner.Please help me out with any idea > about it.What is the correct approach for my concept.
To begin with, learn the basics. Fourier, Z-transform, FIR and IIR filters, etc. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com