comp.dsp | Speech Recognition using Butterworth filters| page 2

Reply by John Monro ●December 28, 20072007-12-28

Vladimir Vassilevsky wrote:
  I have a problem: nobody likes me.
> 
> VLV
> 
>

Reply by robert bristow-johnson ●December 28, 20072007-12-28

On Dec 28, 11:50 am, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> robert bristow-johnson wrote:
> > it's funny, but they kicked me outa Wikipedia (specifically banned me
> > from editing from my account User:rbj)
>
> And you are still sad because of that nonsense?

who said anything about being sad?  or nonsense?

> > because of rude or disruptive
> > or some excuse like that.
>
> IIRC it had something to do with gays and/or evolutionists, right?

more like having to do with people who want Wikipedia to be
specifically gay-friendly and anti-ID, as if both are neutral POVs.
it's like their shit don't stink because they are on the right side.

> >  but i never said GFY to anyone.
>
> You should. It is such a pleasure to call things with their right names.

it's was never about me gratuitously telling people what i thought of
them, it was, for me, about making Wikipedia truly neutral in its
presentation of some controversial subjects and not allowing it to be
used to reflect the opinion of some politically correct interest
groups as if it were neutral fact.  but, as people here know, i
usually don't back away from a (word) fight when i am confident of the
facts and principles of an issue.  but, it's like taking on a
moderator of a moderated newsgroup.  sometimes it isn't who's right on
the facts, but who's stick is bigger, that rules the day.  not being
an admin, i had no stick.

> A man comes to psychoanalyst &#4294967295;onsultant:

well, John Monro took care of that.

Vlad, there may be a cultural disconnect here, i dunno, but i would
hope to think that not all Russians are arrogant and rude.  it's
probably the same as with Americans.

r b-j

Reply by Vladimir Vassilevsky ●December 29, 20072007-12-29

"robert bristow-johnson" <rbj@audioimagination.com> wrote in message
news:96d8c8f0-3a25-40d8-9a7a-ba13624cf6b8@e25g2000prg.googlegroups.com...

>> > it's funny, but they kicked me outa Wikipedia (specifically banned me
> >> from editing from my account User:rbj)

>> And you are still sad because of that nonsense?

> who said anything about being sad?  or nonsense?

Since you keep mentioning that, I assume it is important to you. Why?

>> IIRC it had something to do with gays and/or evolutionists, right?

>more like having to do with people who want Wikipedia to be
>specifically gay-friendly and anti-ID, as if both are neutral POVs.
>it's like their shit don't stink because they are on the right side.

If somebody wraps his hate and bigotry into the politicaly correct manners,
it looks like hate and bigotry wrapped into the politically correct manners.
BTW, I consider hate and bigotry as natural, not like anything to be ashamed
about. I am with you about the first question, can't agree on the second one
though.

>> >  but i never said GFY to anyone.
>> You should. It is such a pleasure to call things with their right names.

>it's was never about me gratuitously telling people what i thought of
>them, it was, for me, about making Wikipedia truly neutral in its
>presentation of some controversial subjects and not allowing it to be
>used to reflect the opinion of some politically correct interest
>groups as if it were neutral fact.  but, as people here know, i
>usually don't back away from a (word) fight when i am confident of the
>facts and principles of an issue.

Many good words and not a bit of doubt in the case :)
Although the whole thing could look differently from the other side.

> but, it's like taking on a
> moderator of a moderated newsgroup.  sometimes it isn't who's right on
>the facts, but who's stick is bigger, that rules the day.  not being
>an admin, i had no stick.

HaHaHa. I've done that, too, and with the same result.  :)

>> A man comes to psychoanalyst &#4294967295;onsultant:

> well, John Monro took care of that.

Unfortunately, he is plonked. If you see him, please let him know that he is
free to go anywhere to do anything with himself, as long as that is the
equal opportunity, socially conscious, environmentally friendly,
non-discriminative and other restrictions apply.

>Vlad, there may be a cultural disconnect here, i dunno, but i would
>hope to think that not all Russians are arrogant and rude.  it's
>probably the same as with Americans.

It is a *big* mistake to make the country-wide assumptions.

VLV

Reply by emilmont ●December 30, 20072007-12-30

There are a couple of "microcontroller design projects" from Cornell
University that can give you the kind of information you are searching:
http://instruct1.cit.cornell.edu/courses/ee476/FinalProjects/s2007/mdr32_as435/mdr32_as435/mdr32_as435_Final_Report.htm
http://instruct1.cit.cornell.edu/courses/ee476/FinalProjects/s2006/avh8_css34/avh8_css34/index.html

HTH and good luck with your project!

:-)

Emilio Monti

Reply by Mandar Gokhale ●January 13, 20082008-01-13

People,

Thanks for the response to my question.....couldn't contribute to the
conversation coz of a misbehaving internet
connection....anyway...so...my responses.......



@ Richard Owlett - thanks fr the info about
comp.speech.research....and i know it's a very complex project...just
doing it out of academic interest......not like my future depends on
it :)...As for what you asked...

1) A question though. Do you really need something that would be
recognized
as "speech recognition"? Do you need to recognize words or perhaps
sounds that can be produced by a voice?

I'm aiming for the chip to recognize four commands given by a single
voice...say mine.....I don't specifically require it to respond/not
respond to other people's voices using the same commands...does that
make it clear enough?.....


@Hardy Spicer

You wrote.. I would use cross-correlation myself if the processor was
fast enough.
"The trouble with using filters on their own will be that it response
to bullshit commands with one on your list."

Could you make that more clear?...I mean,I know the Fourier transform
of the correlation would give me the power spectral density...but
it'll be slightly different every time the word is spoken...right?...I
checked my voice saying the same words on a spectrum analysing
software called Audacity...and the frequency spectrum was slightly
different every time I said the same word........that's why I'm
looking for a method to recognize the commands properly.......

and lastly........
@Vlad.....
Vladimir Vassilevsky wrote:

  I have a problem: nobody likes me.


No wonder, man.........why don't you go and showcase your 'electronic
maschimo' somewhere else on Usenet?...there are plenty of low-usage
groups for you to do that......

Reply by Rafael Deliano ●January 13, 20082008-01-13

> I'm aiming for the chip to recognize four commands
Google for old products:
* VRC008 by Interstate Corp. in 1981 
  singlechip / speakerindependent / 16 words 
  It was basically a Motorola MC6804P2 8 Bit Controller with
  firmware in the 1,2kByte ROM. RAM was 64 byte.  
  Description of the software is in US Patent 4.388.495
  Not much on the analog frontend there.
  There was an evaluation-kit ELV008 with 12 words, but VRC008
  probably never got into production. 
* remake: VCP200 by Voice Control Products Inc early 90ies
  Sold by Archer, from time to time at ebay.
  You guessed it: Motorola MC6804J2 1kByte ROM, RAM 30 byte. 
  Speakerindependent fixed vocabulary 2-5 words for robots: 
  "yes on ; on off ; go turn-right left-turn reverse stop"
  In the datasheet there is the analog circuit. 
* remake of remake: 
  Stewart "low-cost Voice Recognition" Circuit Cellar Ink Feb. 1998
  You will find the article on www plus the Intel-Hex
  for the controller.
  As usual: Motorola MC68HC705J1A 1kByte ROM, RAM 64 byte. 
  This time a serial external EEPROM from Ramtron 24LC04 ( exotic:
  was selected because external "RAM" was needed ) 
  was used to store the vocabulary. Therefore no longer speaker-
  independent but trainable ( i don&#4294967295;t think thats a good choice ). 
* remake of remake ( of remake ): "Mandar Gokhale V0.1"
  Yes almost 10 years are gone, its time for another one.
  Please note Motorola now is called Freescale, but the 68HCS908 
  is waiting... 

The analog frontend in all of these variants is more or less identical.
It produces after (double-)differentiation a zero-crossing bitstream. 
Therefore the controller did not need an A/D-converter only a timer 
that was controlled by a portpin. 

MfG  JRD

Reply by Le Chaud Lapin ●January 13, 20082008-01-13

On Dec 28 2007, 7:01&#4294967295;pm, robert bristow-johnson
<r...@audioimagination.com> wrote:
> it's was never about me gratuitously telling people what i thought of
> them, it was, for me, about making Wikipedia truly neutral in its
> presentation of some controversial subjects and not allowing it to be
> used to reflect the opinion of some politically correct interest
> groups as if it were neutral fact. &#4294967295;but, as people here know, i
> usually don't back away from a (word) fight when i am confident of the
> facts and principles of an issue. &#4294967295;but, it's like taking on a
> moderator of a moderated newsgroup. &#4294967295;sometimes it isn't who's right on
> the facts, but who's stick is bigger, that rules the day. &#4294967295;not being
> an admin, i had no stick.

Since we're on the subject, I remember reading Wikipedia article on
some subject of sexuality a while back, specicifically which, I could
not remember, but it was pretty obvious that it was written by what
one might call, in 2007, a sexual pervert/deviant who might or might
not have been homosexual. [Note that it was not my opinion that the
person was a pervert, nor do I have any contempt for homosexuality.]
What was offensive was the style of what the person wrote. There is no
way that Encylocpedia Brittanica would have allowed the individual to
write what he wrote.

Instead of presenting a balanced point of view on subject matter, it
was more like a cookbook  how to bust the biggest nut while doing
whatever was being described. It was also pretty obvious that the
author was saying, "Look, if you're thinking about doing these things,
you should, and here's how to do it, and don't worry about the rest of
the world."

I think what makes this offensive is that a person starts reading the
article hoping to gain perspective from a knowledgeble person on the
subject, only to discover that they have been lured into an
unsolicited, one-sided conversation intended to seduce the reader into
an imagining an experience that might be considered vulgar to him/her
in real life.

So i would have to agree with RBJ.  While many Wikipedia articles are
excellent, one has to screen inherently subjective topics and be
careful of those who would use it as their own medium for insidiously
projecting their ideas into the minds of unsuspecting readers.

-Le Chaud Lapin-

Reply by Le Chaud Lapin ●January 13, 20082008-01-13

On Jan 12, 10:51&#4294967295;pm, Mandar Gokhale <stallo...@gmail.com> wrote:
>> "The trouble with using filters on their own will be that it response
>> to bullshit commands with one on your list."
>
> Could you make that more clear?...I mean,I know the Fourier transform
> of the correlation would give me the power spectral density...but
> it'll be slightly different every time the word is spoken...right?...I
> checked my voice saying the same words on a spectrum analysing
> software called Audacity...and the frequency spectrum was slightly
> different every time I said the same word........that's why I'm
> looking for a method to recognize the commands properly.......

Disclaimer: What little I know about SR came from a girl I used to
date...so these are just suggestions..:D

He means that if you simply use a a matched-filter (http://
en.wikipedia.org/wiki/Matched_filter) or other technique against the
time-domain signals, you will have trouble because, well...you're in
the time domain.  Two superposed utterances of the input signal, x1[n]
and x2[n], would look drastically different depending on their
relative phases, which is influenced by when you start sampling. Even
a small phase shift between x1[n] and x[2] will break your algorithm.

Yes, spectrum will indeed be slightly different each time, never
exact, but that's ok, as you simply need to distinguish between the
utterances. There are many ways to do this.  Perhaps the easiest is to
regard each |X[k]| of DFT of auto-correlation as components of a
vector.  There will be one vector associated with each utterance.  You
would get the user to utter the same word several times to find, more
or less, the |X[k's]| for a single word. This would involve
normalizing each DFT based on energy content (yelling versus
whispering same word), and finding X*[k], the signal that, when
regarded as a normalized vector among the other normalized vectors,
yields the minimum distance between itself and any of the other
vectors. Of course this is the distance formula in N-space among the
vectors.  After that, when word is uttered, you run through you bank
of X*[], and yield the index of the one that provides the minimum
distance. That will be the index of the uttered word (hopefully).

You can see that you will need to calculate the proper window for the
DFT correctly.  If you simply tell user, "Ok, I'm ready, speak.", and
nothing is said until user takes gum out of his/her mouth, you will
start sampling prematurely and stop sampling prematurely, so you will
have to determine when significant energy begins in the signal and
when it ends.

If I were you, before using a PIC, I would write a few programs in
software to do your experiments.  On Unix or Windows,  there are
plenty of pre-installed tools to sampl audio into a variety of
formats, do your processing, see what works, error rates, etc. Once
you find something you are comfortable with, you can move to hardware
with optimized algorithm.

-Le Chaud Lapin-

Reply by jnarino ●January 14, 20082008-01-14

First, the one who should GFY is Vassily, for being such a rude
ignorant idiot. If you do not know anything about speech recognition,
just shut up.

This question on speech recognition has many possible answers.

Please first define the domain. I will assume you are only trying to
recognize a few words, so you will be doing limited vocabulary
recognition. In this case, your best bet for Speech recognition would
be neural networks. You train the neural network with a few samples of
the intended words.

However, it is not that simple and I will explain why. First, as
somebody already said, you should do DTW (dynamic time warping), to
normalize the length of the utterance. Aftwerwards, you should do
cepstral analysis to obtain a feature vector to feed your neural
network. A simple PIC maybe would not suffice.

The preprocessing stage, with the filters and such, is just for
increasing robustness and getting rid of the information we are not
interested into.

I recommend you to read the HTK Book introductory part to understood
Hidden Markov Models based speech recognition. The book is available
for free (after simple registration) on  http://htk.eng.cam.ac.uk/.

Another solution would be looking for those specialized ICs, but have
not tried them and maybe they are not cheap or readily available.

So basically your system should consist of this, in this order,
connected in cascade


signal adquisition (microphone)
Bandpass filter (can be a butterworth filter) between 100Hz and 4000Hz
(the rest is redundant)
Sampling A/D converter, sampling at least at 8Khz (recommended)

Once the signal is into the microprocessor, the first thing you should
do is voice activity detection (VAD). There are some algorithms for
this, please google it.

Once you have detected the beginning and the end of a utterance, you
should do Dynamic Time warping to normalize its length, so it can be
compared.

Then, do framming and obtain cepstral coefficients.

Feed your neural network and wait for the result.

Of course, first you will need to train the neural network.

If you have more doubts, do not hesitate to ask.

Regards

Juan Pablo

Reply by dbell ●January 14, 20082008-01-14

On Jan 14, 3:55&#4294967295;am, jnarino <jnar...@gmail.com> wrote:
> First, the one who should GFY is Vassily, for being such a rude
> ignorant idiot. If you do not know anything about speech recognition,
> just shut up.
>
> This question on speech recognition has many possible answers.
>
> Please first define the domain. I will assume you are only trying to
> recognize a few words, so you will be doing limited vocabulary
> recognition. In this case, your best bet for Speech recognition would
> be neural networks. You train the neural network with a few samples of
> the intended words.
>
> However, it is not that simple and I will explain why. First, as
> somebody already said, you should do DTW (dynamic time warping), to
> normalize the length of the utterance. Aftwerwards, you should do
> cepstral analysis to obtain a feature vector to feed your neural
> network. A simple PIC maybe would not suffice.
>
> The preprocessing stage, with the filters and such, is just for
> increasing robustness and getting rid of the information we are not
> interested into.
>
> I recommend you to read the HTK Book introductory part to understood
> Hidden Markov Models based speech recognition. The book is available
> for free (after simple registration) on &#4294967295;http://htk.eng.cam.ac.uk/.
>
> Another solution would be looking for those specialized ICs, but have
> not tried them and maybe they are not cheap or readily available.
>
> So basically your system should consist of this, in this order,
> connected in cascade
>
> signal adquisition (microphone)
> Bandpass filter (can be a butterworth filter) between 100Hz and 4000Hz
> (the rest is redundant)
> Sampling A/D converter, sampling at least at 8Khz (recommended)
>
> Once the signal is into the microprocessor, the first thing you should
> do is voice activity detection (VAD). There are some algorithms for
> this, please google it.
>
> Once you have detected the beginning and the end of a utterance, you
> should do Dynamic Time warping to normalize its length, so it can be
> compared.
>
> Then, do framming and obtain cepstral coefficients.
>
> Feed your neural network and wait for the result.
>
> Of course, first you will need to train the neural network.
>
> If you have more doubts, do not hesitate to ask.
>
> Regards
>
> Juan Pablo

Dynamic time warping on the time domain signal?  Have you tried that?

Dirk

Previous 123 Next

Speech Recognition using Butterworth filters

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group