In a recent thread about a very minimal voice recognition project, Rafael Deliano commented that the second derivative of the speech was used to characterize the signal. My math is many decades in past, but I was wondering how cepstrum and second time derivative might be related.
Is there a strong relationship between cepstrum of signal and 2nd derivative of signal?
Started by ●January 16, 2008
Reply by ●January 16, 20082008-01-16
On Jan 17, 2:21 am, Richard Owlett <rowl...@atlascomm.net> wrote:> In a recent thread about a very minimal voice recognition project, > Rafael Deliano commented that the second derivative of the speech was > used to characterize the signal. > > My math is many decades in past, but I was wondering how cepstrum and > second time derivative might be related.Cepstrum is the Fourier Transform of the Log of the Power Spectrum. (nowadays it is often described as the inverse rather than the direct Fourier Transform - this only alters the phase). The idea is that we can convolve in the freq domain by adding since log(a.b) = log(a) + log(b). We can separate (deconvolve) by subtraction since log(a/b) = log(a)-log(b). Now if we take the Cepstrum of a differentiated signal we need to examine what the frequency somain representation of differentiation is. For laplace it is multiplication by s and for fourier I suppose it is jw (j omega)if we can get away with it. The power spectrum would then only differ by w^2 from the original. Taking logs we would get an extra term 2Log(w). We would then need to Fourier transform the whole lot. Maybe somebody can do the Maths. H
Reply by ●January 17, 20082008-01-17
> Is there a strong relationship between cepstrum of signal and > 2ndderivative of signal?I doubt it.> minimal voice recognition project, > the second derivative of the speech was used to characterize the > signal.Infinite peak clipping ( "1 Bit speech" ) is usually attributed to Licklider about 1946. The double differentiator as an improvement in such systems comes in about 1970 by Thomas, Niederjohn. Many of these old papers are collected in: Lim "Speech Enhancement" Prentice Hall 1983 Fits well in with Kryters Articulation Index 1962: http://www.embeddedforth.de/temp/dd2.pdf During the 70ies many speech recognition systems used double http://www.embeddedforth.de/temp/dd1.pdf and tripple differentiators http://www.embeddedforth.de/temp/dd3.pdf to better detect unvoiced sounds. Interstate & clones ( 1 bit zero-crossing input ) only evaluate 3 states: silence ( low frequency ) voiced ( medium " ) unvoiced ( high " ) To get "silence" a small amount of threshold in the KOP is required. For voiced / unvoiced separation one needs to enhance the weak unvoiced sounds. MfG JRD
Reply by ●January 17, 20082008-01-17
Rafael Deliano wrote:>>Is there a strong relationship between cepstrum of signal and >>2ndderivative of signal? > > I doubt it. > > >>minimal voice recognition project, >>the second derivative of the speech was used to characterize the >>signal. > > Infinite peak clipping ( "1 Bit speech" ) is usually attributed to > Licklider about 1946. The double differentiator as an improvement > in such systems comes in about 1970 by Thomas, Niederjohn. Many of > these old papers are collected in: > Lim "Speech Enhancement" Prentice Hall 1983 > Fits well in with Kryters Articulation Index 1962: > http://www.embeddedforth.de/temp/dd2.pdf > > During the 70ies many speech recognition systems used > double > http://www.embeddedforth.de/temp/dd1.pdf > and tripple differentiators > http://www.embeddedforth.de/temp/dd3.pdf > to better detect unvoiced sounds. > > Interstate & clones ( 1 bit zero-crossing input ) > only evaluate 3 states: > silence ( low frequency ) > voiced ( medium " ) > unvoiced ( high " ) > To get "silence" a small amount of threshold in > the KOP is required. For voiced / unvoiced separation one > needs to enhance the weak unvoiced sounds. > > MfG JRDThank you. Wish I could read German. There's enough English-German cognates in technical vocabulary to really tease me ;) Googling for [ "articulation index" "speech recognition" -impaired ] yields an interesting collection of articles. Most are audiology related. Some are about artificial speech recognition. I get the impression that though "articulation index" has been replaced as method for speech recognition there is current research on using it as an aid in noisy environment. More reading ahead. Thanks again.
Reply by ●January 17, 20082008-01-17
On Jan 17, 9:12 pm, Richard Owlett <rowl...@atlascomm.net> wrote:> Rafael Deliano wrote: > >>Is there a strong relationship between cepstrum of signal and > >>2ndderivative of signal? > > > I doubt it. > > >>minimal voice recognition project, > >>the second derivative of the speech was used to characterize the > >>signal. > > > Infinite peak clipping ( "1 Bit speech" ) is usually attributed to > > Licklider about 1946. The double differentiator as an improvement > > in such systems comes in about 1970 by Thomas, Niederjohn. Many of > > these old papers are collected in: > > Lim "Speech Enhancement" Prentice Hall 1983 > > Fits well in with Kryters Articulation Index 1962: > >http://www.embeddedforth.de/temp/dd2.pdf > > > During the 70ies many speech recognition systems used > > double > >http://www.embeddedforth.de/temp/dd1.pdf > > and tripple differentiators > >http://www.embeddedforth.de/temp/dd3.pdf > > to better detect unvoiced sounds. > > > Interstate & clones ( 1 bit zero-crossing input ) > > only evaluate 3 states: > > silence ( low frequency ) > > voiced ( medium " ) > > unvoiced ( high " ) > > To get "silence" a small amount of threshold in > > the KOP is required. For voiced / unvoiced separation one > > needs to enhance the weak unvoiced sounds. > > > MfG JRD > > Thank you. Wish I could read German. There's enough English-German > cognates in technical vocabulary to really tease me ;) > > Googling for [ "articulation index" "speech recognition" -impaired ] > yields an interesting collection of articles. Most are audiology > related. Some are about artificial speech recognition. I get the > impression that though "articulation index" has been replaced as method > for speech recognition there is current research on using it as an aid > in noisy environment. More reading ahead. Thanks again.Hello Richard, I will put my two cents on it. Actually speech recognition is done via Hidden Markov Models (HMMs), which are 'fed' with a cepstral coefficients obtained by framing the voice signal. In has been known for quite a long time that speech dynamics, that is the rate of variation of voice characteristics, yields very important information but including this in directly in HMMs framework is not possible. So, what has been done is to obtain directly the first derivatives (or delta) coefficients of the cepstral vector, and obtain the second derivatives (delta-delta) coefficients. This is added to the features vector, speech recognition results have improved and this has become standard in today's speech recognition systems. The articulation index is a measure of speech intelligibility made by Fletcher in early experiments on AT&T for telephone communications. You can find more about it in the paper HOW DO HUMANS PROCESS AND RECOGNIZE SPEECH? by J.B Allen appeared in IEEE Transactions on Speech and Audio Processing, volume 2, number 4,1994. Regards Juan Pablo
Reply by ●January 17, 20082008-01-17
On Jan 16, 8:21�am, Richard Owlett <rowl...@atlascomm.net> wrote:> In a recent thread about a very minimal voice recognition project, > Rafael Deliano commented that the second derivative of the speech was > used to characterize the signal. > > My math is many decades in past, but I was wondering how cepstrum and > second time derivative might be related.Hi Richard, Complex cepstrum (includes effect of FFT phase) or real cepstrum (no FFT phase used)? Dirk
Reply by ●January 17, 20082008-01-17
dbell wrote:> On Jan 16, 8:21 am, Richard Owlett <rowl...@atlascomm.net> wrote: > >>In a recent thread about a very minimal voice recognition project, >>Rafael Deliano commented that the second derivative of the speech was >>used to characterize the signal. >> >>My math is many decades in past, but I was wondering how cepstrum and >>second time derivative might be related. > > > Hi Richard, > > Complex cepstrum (includes effect of FFT phase) or real cepstrum (no > FFT phase used)? > > DirkGuess I know even less than I thought ;) So I suspect the appropriate answer would be "both/either". My goal is to know what people are talking about than being able to apply it. I had ~2 years of college math >40 years ago, and then actually used little of it.
Reply by ●January 17, 20082008-01-17
On Jan 17, 1:47�pm, Richard Owlett <rowl...@atlascomm.net> wrote:> dbell wrote: > > On Jan 16, 8:21 am, Richard Owlett <rowl...@atlascomm.net> wrote: > > >>In a recent thread about a very minimal voice recognition project, > >>Rafael Deliano commented that the second derivative of the speech was > >>used to characterize the signal. > > >>My math is many decades in past, but I was wondering how cepstrum and > >>second time derivative might be related. > > > Hi Richard, > > > Complex cepstrum (includes effect of FFT phase) or real cepstrum (no > > FFT phase used)? > > > Dirk > > Guess I know even less than I thought ;) > So I suspect the appropriate answer would be "both/either". > My goal is to know what people are talking about than being able to > apply it. I had ~2 years of college math >40 years ago, and then > actually used little of it.- Hide quoted text - > > - Show quoted text -Richard, By "strong relationship" do you mean are they similar or that one is in some (possibly complex) way related to the other? Dirk
Reply by ●January 17, 20082008-01-17
dbell wrote:> On Jan 17, 1:47 pm, Richard Owlett <rowl...@atlascomm.net> wrote: > >>dbell wrote: >> >>>On Jan 16, 8:21 am, Richard Owlett <rowl...@atlascomm.net> wrote: >> >>>>In a recent thread about a very minimal voice recognition project, >>>>Rafael Deliano commented that the second derivative of the speech was >>>>used to characterize the signal. >> >>>>My math is many decades in past, but I was wondering how cepstrum and >>>>second time derivative might be related. >> >>>Hi Richard, >> >>>Complex cepstrum (includes effect of FFT phase) or real cepstrum (no >>>FFT phase used)? >> >>>Dirk >> >>Guess I know even less than I thought ;) >>So I suspect the appropriate answer would be "both/either". >>My goal is to know what people are talking about than being able to >>apply it. I had ~2 years of college math >40 years ago, and then >>actually used little of it.- Hide quoted text - >> >>- Show quoted text - > > > Richard, > > By "strong relationship" do you mean are they similar or that one is > in some (possibly complex) way related to the other? > > DirkI do not know. I do not know even what questions I should be asking. I hope others would chime in with questions I _should_ *ASK* [hint to BCC recipients ;]
Reply by ●January 18, 20082008-01-18
> I get the impression that though "articulation index" has been > replaced as method for speech recognition there is current > research on using it as an aid in noisy environment.It wasn�t a method of speech recognition, but used to a measure speech intelligibility in noise. Old ANSI-standard, about 1969, not so common anymore. But it shows which parts of the spectrum are important. It should be compared to the long term spectrum of speech: http://www.embeddedforth.de/temp/hoth.pdf ( > Wish I could read German. Looking at the pictures is usually sufficient ... ) Normally there is not much energy in the unvoiced parts of speech. The hoth noise mentioned there is a simulation of the noise in domestic/office environment. Its low frequency. Both reasons for the (double-)differentiator. Thats the original circuit for the VCP200 ( plus much German text ): http://www.embeddedforth.de/temp/vcp1.pdf The very first speech recognition unit 1952 Davis, Biddulp, Balashek did use a different circuit with two channels: http://www.embeddedforth.de/temp/vcp2.pdf Resurfaced in the 70ies again. What do you want to build ? MfG JRD






