[ Please note that ' and/or " are SIGNIFICANT ] I'm revisiting the topic as Google (has lost)/(is hiding) previous thread. [ Couldn't be I don't remember keywords COULD IT ;? I'm new to DSP I'm new to speech recognition The speech recognition literature that I can find on WEB doesn't answer questions I have in a form I can comprehend. So I've taken an experimental approach. Must be reasonably valid as I have "discovered" formants :) *NOTE BENE* I know I'm reinventing wheel. Part of my "purpose" is 'discovering' fire. My current experiment is characterizing speech by observing time variance of power in sub-octave bands as a function of time. IIRC, The last time I raised the issue, someone said something about filter banks. At the time it did not make sense to me so I pursued my experimental method of doing a batch of FFT's equally spaced in time and attempting a "3D plot". I've gotten far enough now to understand why suggestion was given. BUT, I've lost the suggestion. What I'm considering is: 1. do a series of FFT's on constant with windows with a constant time offset. 2. on each FFT, sum the squares of (fft)bin values over appropriate range of bins 3. create some sort of 3D surface plot of result Is there a computationally &/or ????? simpler approach? Real time is NOT an issue as sample speech is on studio quality CD. I'm running this experiment in Scilab so I'm looking more for approachs with simple math than "efficient" chip based DSP. Side question. "What question do you think I *should* be asking ?
Characterizing speech -- revisited [ long winded with question at end ]
Started by ●August 10, 2004
Reply by ●August 10, 20042004-08-10
Richard Owlett wrote:> > [ Please note that ' and/or " are SIGNIFICANT ] > > I'm revisiting the topic as Google (has lost)/(is hiding) previous > thread. [ Couldn't be I don't remember keywords COULD IT ;? > > I'm new to DSP > I'm new to speech recognition > > The speech recognition literature that I can find on WEB doesn't > answer questions I have in a form I can comprehend. > > So I've taken an experimental approach. > Must be reasonably valid as I have "discovered" formants :) > > *NOTE BENE* > I know I'm reinventing wheel. > Part of my "purpose" is 'discovering' fire. > > My current experiment is characterizing speech by observing time > variance of power in sub-octave bands as a function of time. > > IIRC, The last time I raised the issue, someone said something about > filter banks. > > At the time it did not make sense to me so I pursued my experimental > method of doing a batch of FFT's equally spaced in time and attempting > a "3D plot". > > I've gotten far enough now to understand why suggestion was given. > BUT, I've lost the suggestion. > > What I'm considering is: > 1. do a series of FFT's on constant with windows with a constant > time offset. > 2. on each FFT, sum the squares of (fft)bin values over appropriate > range of bins > 3. create some sort of 3D surface plot of resultWithout commenting on the rest, 3d surface plot requires the additional complexity of a means of viewing it. A false color plot would be easier. The X and Y would be the same as a surface but the z value would be represented by different colors. -jim> > Is there a computationally &/or ????? simpler approach? > Real time is NOT an issue as sample speech is on studio quality CD. > I'm running this experiment in Scilab so I'm looking more for > approachs with simple math than "efficient" chip based DSP. > > Side question. > "What question do you think I *should* be asking ?-----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World! -----== Over 100,000 Newsgroups - 19 Different Servers! =-----
Reply by ●August 10, 20042004-08-10
"Richard Owlett" <rowlett@atlascomm.net> wrote in message news:10hibo6o2lr1h1d@corp.supernews.com...> [ Please note that ' and/or " are SIGNIFICANT ] > > I'm revisiting the topic as Google (has lost)/(is hiding) previous > thread. [ Couldn't be I don't remember keywords COULD IT ;? > > I'm new to DSP > I'm new to speech recognition > > The speech recognition literature that I can find on WEB doesn't > answer questions I have in a form I can comprehend. > > So I've taken an experimental approach. > Must be reasonably valid as I have "discovered" formants :) > > *NOTE BENE* > I know I'm reinventing wheel. > Part of my "purpose" is 'discovering' fire. > > My current experiment is characterizing speech by observing time > variance of power in sub-octave bands as a function of time. > > IIRC, The last time I raised the issue, someone said something about > filter banks. > > At the time it did not make sense to me so I pursued my experimental > method of doing a batch of FFT's equally spaced in time and attempting > a "3D plot". > > I've gotten far enough now to understand why suggestion was given. > BUT, I've lost the suggestion. > > What I'm considering is: > 1. do a series of FFT's on constant with windows with a constant > time offset. > 2. on each FFT, sum the squares of (fft)bin values over appropriate > range of bins > 3. create some sort of 3D surface plot of result > > Is there a computationally &/or ????? simpler approach?Would a spectrogram plot (available in matlab - not sure about scilab) do what the OP (Richard) wants? The question is more for others in the group than the OP since I'm not entirely tuned into the context from the OP 's previous thread. Cheers Bhaskar> Real time is NOT an issue as sample speech is on studio quality CD. > I'm running this experiment in Scilab so I'm looking more for > approachs with simple math than "efficient" chip based DSP. > > > Side question. > "What question do you think I *should* be asking ? > > > > > >
Reply by ●August 10, 20042004-08-10
jim wrote:> > Richard Owlett wrote: >[snip]>> >>What I'm considering is: >> [snip] >> 3. create some sort of 3D surface plot of result > > > Without commenting on the rest, 3d surface plot requires the additional > complexity of a means of viewing it. A false color plot would be easier. > The X and Y would be the same as a surface but the z value would be > represented by different colors. > > -jim > >That's standard in speech recognition world. But I've spent toooo much time in front of a Tektronix Spectrum Analyzer. "Color" just does not convey amplitude for me. And the display issue is a PROBLEM in and of itself. Scilab &/or gnuplot should handle the problem.
Reply by ●August 11, 20042004-08-11
Richard Owlett <rowlett@atlascomm.net> wrote in message news:<10hibo6o2lr1h1d@corp.supernews.com>...> [ Please note that ' and/or " are SIGNIFICANT ] > > I'm revisiting the topic as Google (has lost)/(is hiding) previous > thread. [ Couldn't be I don't remember keywords COULD IT ;?...> IIRC, The last time I raised the issue, someone said something about > filter banks....> I've gotten far enough now to understand why suggestion was given. > BUT, I've lost the suggestion.Hi Richard. Could this be the thread you were looking for? http://groups.google.com/groups?hl=no&lr=&ie=UTF-8&threadm=f56893ae.0310020458.7973161f%40posting.google.com&rnum=5&prev=/groups%3Fhl%3Dno%26lr%3D%26ie%3DUTF-8%26q%3Dauthor%253Aallnor%40tele.ntnu.no%2B%2522%2522spectrogram%2522%2522%26meta%3Dgroup%253Dcomp.dsp Rune
Reply by ●August 11, 20042004-08-11
Bhaskar Thiagarajan wrote:> > Would a spectrogram plot (available in matlab - not sure about scilab) do > what the OP (Richard) wants? > The question is more for others in the group than the OP since I'm not > entirely tuned into the context from the OP > 's previous thread. >Yes Scilab has that function, but it does not present data in format I'm comfortable with. So what if the "whole world" uses that form ;} My question was intended to be related to computational efficiency of calculating data to be displayed rather than the format for displaying the result. Rune has found the thread. Now I'll have to review thread I started ;/
Reply by ●August 11, 20042004-08-11
Rune Allnor wrote:> Richard Owlett <rowlett@atlascomm.net> wrote in message news:<10hibo6o2lr1h1d@corp.supernews.com>... > >>[ Please note that ' and/or " are SIGNIFICANT ] >> >>I'm revisiting the topic as Google (has lost)/(is hiding) previous >>thread. [ Couldn't be I don't remember keywords COULD IT ;? > > ... > >>IIRC, The last time I raised the issue, someone said something about >>filter banks. > > ... > >>I've gotten far enough now to understand why suggestion was given. >>BUT, I've lost the suggestion. > > > Hi Richard. > > Could this be the thread you were looking for? > > http://groups.google.com/groups?hl=no&lr=&ie=UTF-8&threadm=f56893ae.0310020458.7973161f%40posting.google.com&rnum=5&prev=/groups%3Fhl%3Dno%26lr%3D%26ie%3DUTF-8%26q%3Dauthor%253Aallnor%40tele.ntnu.no%2B%2522%2522spectrogram%2522%2522%26meta%3Dgroup%253Dcomp.dsp > > RuneYES I started it and did not recognize it ;[ Looking at it reminded me of several questions I should be asking myself. The question of this thread should be rephrased. Can a batch of z filter banks be more computationally/programatically more efficient than doing an FFT and appropriately summing the z intervals.
Reply by ●August 11, 20042004-08-11
Richard Owlett wrote: ...> The question of this thread should be rephrased. > Can a batch of z filter banks be more computationally/programatically > more efficient than doing an FFT and appropriately summing the z intervals.What is a Z interval? Jerry -- ... the worst possible design that just meets the specification - almost a definition of practical engineering. .. Chris Bore ������������������������������������������������������������������������
Reply by ●August 12, 20042004-08-12
Richard Owlett <rowlett@atlascomm.net> wrote in message news:<10hibo6o2lr1h1d@corp.supernews.com>...> What I'm considering is: > 1. do a series of FFT's on constant with windows with a constant > time offset. > 2. on each FFT, sum the squares of (fft)bin values over appropriate > range of bins > 3. create some sort of 3D surface plot of result > > Is there a computationally &/or ????? simpler approach?There might just be. I'd suggest an approach based on filter banks, that might do what you want. My approach does, however, rely on a couple of relatively tricky aspects of DSP (IIR filters and Envelope Detectors) that may be cumbersome to design when one is not euded to working with these kinds of things. Once the system is up and running, it may be easier from a usere's perspective, and might also provide results that may or may not be somewhat easier to interpret. --- CAVEAT --- The time-domain data generated below might not be useful for your speech processing application. You might nevrtheless find the excercise interesting. --- oo00oo --- I'm using matlab synatx when appropriate, which shouldn't be too dissimilar from scilab. Outline of algorithm: - Decide on the number M of filter bands necessary and their respective bandwidths and center frequencies. - Use some tool (or ask here how) to design IIR filters that meet the specs for each filter in this filter bank. Make sure (this could very well be a non-trivial constraint) that each filter is norm-preserving, i.e. that the gain is proportional to bandwidth. - Run your signal through this filter bank to obtain M bandpass signals. With some luck, this operation is available and transparent in scilab so that you don't have to worry about exactly how it is implemented. - Run an envelope detector on each bandpass signal (check out Ch 9.2 in Rick's book, 2nd ed.) - Plot the envelopes in a waterfall display: % Start Matlab Pseudo Code %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % The pseudo code is based on the following data being available: % % - Filtered & amplitude processed bandpass data are assumed to be % stored in the NxM array "BPdata". % - The center frequency for each band is assumed to be stored in % the Mx1 array "fvec". % - The sampling frequency fs is known, as well as dt=1/fs. wfd= zeros(N,M); % Assign space for waterfall data. % Saves run-time when working with matlab. tv=[0:N-1]*dt; % Time vector for plotting bpdata=BPdata/max(max(BPdata))*max(fvec); % Normalize frequency data to relate to % frequency bands. Might need fiddling... for m=1:M wfd(:,m)=bpdata(:,m)+fvec(m); % Offset data trace for waterfall display end plot(tv,fdata,'b') % End Matlab Pseudo Code %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% I've got the impression from other posts of yours that you don't like surface/contour plots too much. Perhaps you find the waterfall display more convenient. Rune
Reply by ●August 12, 20042004-08-12
Jerry Avins wrote:> Richard Owlett wrote: > > ... > >> The question of this thread should be rephrased. >> Can a batch of z filter banks be more computationally/programatically >> more efficient than doing an FFT and appropriately summing the z >> intervals. > > > What is a Z interval? > > Jerryz is just a number I would just "chop" fft bins into z groups such that grouping bins 1 n(1)--n(2) 2 n(3)--n(4) | | z n(m)--n(m+1)






