On Dec 1, 7:52�am, fatalist <simfid...@gmail.com> wrote:
> On Nov 27, 7:45�pm, Raeldor <rael...@gmail.com> wrote:
>
> > Hi All,
>
> > I am trying to find the best way to analyze 2 speech signals from 2
> > different speakers and come up with a 'match percentage' as to how
> > close they were (ie, did the speakers say the same thing).
>
> > I have been reading that LPC is great for encoding the basic
> > parameters of speech, but most of the articles are related to
> > compression or building of a vocoder. �Has anyone had experience of or
> > know any good resources or c/c++ libraries for extracting the
> > parameters of speech for comparison? �Is LPC likely the best algorithm
> > for this type of work?
>
> > Thanks
> > Ray
>
> Google MFCC, PLP, DP, HMM etc. etc. etc
>
> Go through about 1000 references
>
> Better yet, forget about the whole thing: it's called ASR (automatic
> speech recognition)
>
> Google it too...
>
> But, just for starters:
>
> http://cslu.cse.ogi.edu/toolkit/http://cmusphinx.sourceforge.net/http://www.isip.piconepress.com/projects/speech/software/
>
> This is all crap anyway...
Thank you for these links. Looks like there's a lot of good info I
haven't seen yet. I guess having the right terminology helps! :)
Reply by fatalist●December 1, 20102010-12-01
On Nov 27, 7:45=A0pm, Raeldor <rael...@gmail.com> wrote:
> Hi All,
>
> I am trying to find the best way to analyze 2 speech signals from 2
> different speakers and come up with a 'match percentage' as to how
> close they were (ie, did the speakers say the same thing).
>
> I have been reading that LPC is great for encoding the basic
> parameters of speech, but most of the articles are related to
> compression or building of a vocoder. =A0Has anyone had experience of or
> know any good resources or c/c++ libraries for extracting the
> parameters of speech for comparison? =A0Is LPC likely the best algorithm
> for this type of work?
>
> Thanks
> Ray
Reply by Vladimir Vassilevsky●November 30, 20102010-11-30
Raeldor wrote:
> On Nov 29, 3:12 pm, Vladimir Vassilevsky <nos...@nowhere.com> wrote:
>
>>Clay wrote:
>>
>>>On Nov 27, 7:45 pm, Raeldor <rael...@gmail.com> wrote:
>>
>>>>Hi All,
>>
>>>>I am trying to find the best way to analyze 2 speech signals from 2
>>>>different speakers and come up with a 'match percentage' as to how
>>>>close they were (ie, did the speakers say the same thing).
>>
>>>>I have been reading that LPC is great for encoding the basic
>>>>parameters of speech, but most of the articles are related to
>>>>compression or building of a vocoder. Has anyone had experience of or
>>>>know any good resources or c/c++ libraries for extracting the
>>>>parameters of speech for comparison? Is LPC likely the best algorithm
>>>>for this type of work?
>>
>>>You can use either dynamic time warping or hidden markov models. You
>>>may use either the LPC coefs or convert them to PARCORs in either of
>>>these two cost mechanisms.
>>
>>Problem is, there is not much relation between the perceptual similarity
>>and the similarity of LPCs or ParCors. I.e. neither LPCs nor parcors are
>>good functions for pattern matching; LSPs or cepstrum coefficients could
>>be better in this regard. But, it's all minor technicalities compared to
>>the incredible problem stated by OP.
>>
> Maybe I should explain where I am at the moment.
> I am calculating the FFT of a 128-sample windowed using gaussian
> distribution, then converting this to decibels. I can see the peaks
> of the formants visually, but there is a lot of noise on the graph
> (of
> the data), I think probably caused by the harmonics. Is there a way
> to clean this noise so I can see the formant peaks as smooth peaks in
> the graph? The smaller (128-bit) sample size helped with this, as
> did
> the gaussian window, but I can't help but think there is a better
> approach for this?
Raeldor,
This is business. You can hire my services; contact at the web site. You
can also consider filterbank TMS C67x software offered by Bharat Pathak.
Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
Reply by Raeldor●November 30, 20102010-11-30
On Nov 29, 3:12�pm, Vladimir Vassilevsky <nos...@nowhere.com> wrote:
> Clay wrote:
> > On Nov 27, 7:45 pm, Raeldor <rael...@gmail.com> wrote:
>
> >>Hi All,
>
> >>I am trying to find the best way to analyze 2 speech signals from 2
> >>different speakers and come up with a 'match percentage' as to how
> >>close they were (ie, did the speakers say the same thing).
>
> >>I have been reading that LPC is great for encoding the basic
> >>parameters of speech, but most of the articles are related to
> >>compression or building of a vocoder. �Has anyone had experience of or
> >>know any good resources or c/c++ libraries for extracting the
> >>parameters of speech for comparison? �Is LPC likely the best algorithm
> >>for this type of work?
>
> >>Thanks
> >>Ray
>
> > You can use either dynamic time warping or hidden markov models. You
> > may use either the LPC coefs or convert them to PARCORs in either of
> > these two cost mechanisms.
>
> Problem is, there is not much relation between the perceptual similarity
> and the similarity of LPCs or ParCors. I.e. neither LPCs nor parcors are
> good functions for pattern matching; LSPs or cepstrum coefficients could
> be better in this regard. But, it's all minor technicalities compared to
> the incredible problem stated by OP.
>
> Vladimir Vassilevsky
> DSP and Mixed Signal Design Consultanthttp://www.abvolt.com
Maybe I should explain where I am at the moment.
I am calculating the FFT of a 128-sample windowed using gaussian
distribution, then converting this to decibels. I can see the peaks
of the formants visually, but there is a lot of noise on the graph
(of
the data), I think probably caused by the harmonics. Is there a way
to clean this noise so I can see the formant peaks as smooth peaks in
the graph? The smaller (128-bit) sample size helped with this, as
did
the gaussian window, but I can't help but think there is a better
approach for this?
Thanks
Ray
Reply by Raeldor●November 29, 20102010-11-29
On Nov 29, 3:12�pm, Vladimir Vassilevsky <nos...@nowhere.com> wrote:
> Clay wrote:
> > On Nov 27, 7:45 pm, Raeldor <rael...@gmail.com> wrote:
>
> >>Hi All,
>
> >>I am trying to find the best way to analyze 2 speech signals from 2
> >>different speakers and come up with a 'match percentage' as to how
> >>close they were (ie, did the speakers say the same thing).
>
> >>I have been reading that LPC is great for encoding the basic
> >>parameters of speech, but most of the articles are related to
> >>compression or building of a vocoder. �Has anyone had experience of or
> >>know any good resources or c/c++ libraries for extracting the
> >>parameters of speech for comparison? �Is LPC likely the best algorithm
> >>for this type of work?
>
> >>Thanks
> >>Ray
>
> > You can use either dynamic time warping or hidden markov models. You
> > may use either the LPC coefs or convert them to PARCORs in either of
> > these two cost mechanisms.
>
> Problem is, there is not much relation between the perceptual similarity
> and the similarity of LPCs or ParCors. I.e. neither LPCs nor parcors are
> good functions for pattern matching; LSPs or cepstrum coefficients could
> be better in this regard. But, it's all minor technicalities compared to
> the incredible problem stated by OP.
>
> Vladimir Vassilevsky
> DSP and Mixed Signal Design Consultanthttp://www.abvolt.com
Maybe I should explain where I am at the moment.
I am calculating the FFT of a 128-sample windowed using gaussian
distribution, then converting this to decibels. I can see the peaks
of the formants visually, but there is a lot of noise on the graph (of
the data), I think probably caused by the harmonics. Is there a way
to clean this noise so I can see the formant peaks as smooth peaks in
the graph? The smaller (128-bit) sample size helped with this, as did
the gaussian window, but I can't help but think there is a better
approach for this?
Thanks
Ray
Reply by Vladimir Vassilevsky●November 29, 20102010-11-29
Clay wrote:
> On Nov 27, 7:45 pm, Raeldor <rael...@gmail.com> wrote:
>
>>Hi All,
>>
>>I am trying to find the best way to analyze 2 speech signals from 2
>>different speakers and come up with a 'match percentage' as to how
>>close they were (ie, did the speakers say the same thing).
>>
>>I have been reading that LPC is great for encoding the basic
>>parameters of speech, but most of the articles are related to
>>compression or building of a vocoder. Has anyone had experience of or
>>know any good resources or c/c++ libraries for extracting the
>>parameters of speech for comparison? Is LPC likely the best algorithm
>>for this type of work?
>>
>>Thanks
>>Ray
>
>
> You can use either dynamic time warping or hidden markov models. You
> may use either the LPC coefs or convert them to PARCORs in either of
> these two cost mechanisms.
Problem is, there is not much relation between the perceptual similarity
and the similarity of LPCs or ParCors. I.e. neither LPCs nor parcors are
good functions for pattern matching; LSPs or cepstrum coefficients could
be better in this regard. But, it's all minor technicalities compared to
the incredible problem stated by OP.
Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
Reply by Clay●November 29, 20102010-11-29
On Nov 27, 7:45�pm, Raeldor <rael...@gmail.com> wrote:
> Hi All,
>
> I am trying to find the best way to analyze 2 speech signals from 2
> different speakers and come up with a 'match percentage' as to how
> close they were (ie, did the speakers say the same thing).
>
> I have been reading that LPC is great for encoding the basic
> parameters of speech, but most of the articles are related to
> compression or building of a vocoder. �Has anyone had experience of or
> know any good resources or c/c++ libraries for extracting the
> parameters of speech for comparison? �Is LPC likely the best algorithm
> for this type of work?
>
> Thanks
> Ray
You can use either dynamic time warping or hidden markov models. You
may use either the LPC coefs or convert them to PARCORs in either of
these two cost mechanisms.
Clay
Reply by bharat pathak●November 28, 20102010-11-28
i was working on filterbank approach to speech recognition.
This project was on isolated word recognition. So a user will
speak a set of words (example 1 to 10) and speak it for 8 times
to perform template averaging.
Templates were computed using filter banks, LPF's and decimation.
The 2D stored templates were then compared with the incoming
word sample to find the match.
I have working c-code which was also ported on C6713 TI dSP platform.
Willing to sell the code if anyone is interested to buy it.
Regards
Bharat
Reply by Raeldor●November 28, 20102010-11-28
On Nov 27, 7:08�pm, HardySpicer <gyansor...@gmail.com> wrote:
> On Nov 28, 1:45�pm, Raeldor <rael...@gmail.com> wrote:
>
> > Hi All,
>
> > I am trying to find the best way to analyze 2 speech signals from 2
> > different speakers and come up with a 'match percentage' as to how
> > close they were (ie, did the speakers say the same thing).
>
> > I have been reading that LPC is great for encoding the basic
> > parameters of speech, but most of the articles are related to
> > compression or building of a vocoder. �Has anyone had experience of or
> > know any good resources or c/c++ libraries for extracting the
> > parameters of speech for comparison? �Is LPC likely the best algorithm
> > for this type of work?
>
> > Thanks
> > Ray
>
> Try Matlab.
Not sure that's an option, as I need to integrate the end results into
a practical application. I have read about Matlab's DSP toolkit, and
it looks really nice, but unless they provide source code or a c/c++
library for the functions it'll be very hard to translate it into my
application.
Reply by HardySpicer●November 27, 20102010-11-27
On Nov 28, 1:45�pm, Raeldor <rael...@gmail.com> wrote:
> Hi All,
>
> I am trying to find the best way to analyze 2 speech signals from 2
> different speakers and come up with a 'match percentage' as to how
> close they were (ie, did the speakers say the same thing).
>
> I have been reading that LPC is great for encoding the basic
> parameters of speech, but most of the articles are related to
> compression or building of a vocoder. �Has anyone had experience of or
> know any good resources or c/c++ libraries for extracting the
> parameters of speech for comparison? �Is LPC likely the best algorithm
> for this type of work?
>
> Thanks
> Ray