DSPRelated.com
Forums

Voice recognition using TMS

Started by andrecr03 7 years ago14 replieslatest reply 7 years ago254 views
Hello,

My name is André, and I'm currently studying Electrical Engineering. I signed up for this forum because I have an important work to do on my holidays for a DSP subject, due to January 25th!

So, here it is: I have a TMS320C5515 DSP board, by Texas Instruments, and I have to make a voice recognition system, using Code Composer Studio v4.1.3, capable of making LEDs on a breadboard to blink in real time. For instance, I say "green" and the green LED lights up (just a flag for future applications).

I'm looking for tutorials, videos, guides, or example algorithms so I can study the methods of detecting, sampling and processing the voice signal, I pretty much don't have a solid idea where to start.

Thank you in advance!
[ - ]
Reply by MichaelRWJanuary 17, 2017

Hello, André:

I would suggest that you approach this project in two steps.

First, write code that will get the DSP board to act "transparently."  This means the DSP board should convert an input analogue signal such as speech into its digital format via the DAC, then pass this digital signal directly to the ADC where you can monitor the analogue output using a speaker.  By doing this you'll know that the board is operating correctly and that the speech is properly converted into its digital representation. Knowing that the speech signal is properly represented in the digital domain will allow you to process it so that you can turn the DSP board LEDs on or off as required.

In most cases, DSP boards such as the one you have will have a bundled set of baseline code.  Using this code will help you get the board working as described above.  The baseline code is critical because it configures the board (i.e. sampling rate and so forth) and puts into a running state.

Second, write code that will "convert" the target-word (i.e. the spoken word "green") into a signal that you can use to control the LEDs.  I would suggest developing this algorithm in a higher-level language such as Matlab.  You can deal with the conceptual aspects of your development more readily in Matlab than by developing directly in Code Composer Studio.  Once you have a functional algorithm in Matlab, you can go through the process of converting it to C/C++, which is likely the language used in Code Composer Studio.

I have not developed such an application before.  However, here is a short-list of reference material that you may find helpful:

Feature Extraction

Developing an Isolated Word Recognition System in Matlab

Agrawal_Raikwar_2016.pdf

I hope this helps with your project.  Have fun!


Michael.

[ - ]
Reply by Tim WescottJanuary 17, 2017

I would start even further back than that: just make the LED blink.  Then make the board act as a transparent conductor of audio.  I've found that when you hand a brand new, unknown (to the engineer) board to an embedded engineer, you can almost measure how long they've been in the business by how dead-simple the first thing they do is: newbies always want to get the whole app written and then try it; experienced guys will start out with a 10-line "main" function that just blinks an LED using a software timing loop; guys with a year or two under their belts will try something in between.

The "algorithm" to make the ADC talk to the DAC is:

  • Read the ADC at a constant sampling rate
  • Write to the DAC every time you read from the ADC


That's why folks are saying "there is no algorithm"

[ - ]
Reply by andrecr03January 17, 2017

Thank you very much, Tim!
Actually I'm a newbie in DSP and I don't know how to properly read the "ADC at a constant sampling rate", make the TMS read the microphone, and so on... I couldn't find any tutorial on the internet, I really need to understand the basics before anything. Can you help me with that?

[ - ]
Reply by Tim WescottJanuary 17, 2017

Strictly speaking, you're dealing with a combination of DSP and embedded programming.  The DSP part is doing the math and coming up with the algorithm; the embedded programming part is making it work on that board (instead of, for instance, a PC).

Given what you just said, I'm going to change my recommendation on how to start.  Someone mentioned this in passing, but I'm going to go into more detail here:

These evaluation boards always come with a bunch of example applications that show you how to exercise the hardware.  From my perspective as a guy in the biz for over 25 years who wants to make stuff that'll be maintainable, they're not well thought out.  But they usually work as stand-alone apps fairly well.

So if there's a built-in microphone then there should be an application that comes with the board that reads audio from that microphone and does something with it.  It should come as both a binary that you can just load and run, and as a full project, already set up for the tools, that you can build, and then load and run.

For you, I would say to start with one of those apps.  If you're really lucky, there's a color-organ app or something similar, that'll actually make the lights do something in response to sound -- if that's the case, then the "only" (and I use that diminutive with tongue in cheek) thing you have to do is to replace the stuff in the middle with your speech recognition code.

Or, if you get desperate, and there's a color-organ app, use it and say "green" in a really high-pitched voice and "red" in a low growly voice.  With luck you'll have "speech recognition" good enough to fool the prof -- or perhaps make him laugh enough that he won't summarily evict you from the class.

Unless the board is horribly old TI will have a web page for it, and on that web page will be a .zip file, a .iso file, or a .exe install file or some combination.  There should be everything you need to start using the board, including binaries for all of the projects.  Included in that file (or set thereof) should be some really basic tutorials to load the example programs and to get them running (sometimes a particular example program will need certain jumpers set).

Start there, and keep in touch.

[ - ]
Reply by andrecr03January 17, 2017

Thank you very much for your response!
I'll keep in touch.
Someone sent me this material: 
https://e2e.ti.com/group/universityprogram/educato...

It's very good, and contains some algorithms and tutorials, if anyone asks like I did!


[ - ]
Reply by TreefarmerJanuary 17, 2017

If you don't have access to MATLAB you might try looking at the waveform of the various words you need to recognize. You can get inexpensive software such as Sound Forge Audio Studio 10 or Magix Rescue that will display the waveform recorded from your microphone that is plugged in to your sound card. See if you can recognize patterns in the attack and decay times of different words. Try recording the word "green" in the left channel, then in the right channel. Time align the two channels using cut/paste and then combine them into monaural. Do it again to get several samples. Save the combo. Then do it for the word "red." Your algorithm could be to sample the various amplitude levels along the recordings. For example, "red might have a fast decay time at the end whereas "green" might have a slow decay time. (I don't really know without trying it.)

A more complex method is to use MATLAB to do a Fast Fourier Transform (FFT) to find frequency concentrations in one word vis a vis another word.

As Tim Wescott pointed out, such simple methods will only work for one speaker and for a limited number of words. For multiple speakers using continuous speech you will get very deeply into the field of Artificial Intelligence -- and that is probably where you want to explore a career path assuming the smart machines don't beat you to it themselves :-)

[ - ]
Reply by andrecr03January 17, 2017

Thank you very much, Michael!
I'll take a look at those materials you gathered for me!

Do you know any algorithm that I can use as a base, for testing the DAC and ADC as you said?

[ - ]
Reply by DaniloDaraJanuary 17, 2017

No algorithms, just have input capture and feed the ADC with that.
A simple circular buffer will be more that fine.

[ - ]
Reply by MichaelRWJanuary 17, 2017

As @DaniloDara has said, no specific algorithm is necessary to accomplish this task.  However, you will need to write some code that will transfer the digital input to the ADC.  A circular buffer is one way to accomplish this transfer.

You will need the C/C++ code that initialises the board and provides you with a framework to insert your code.

Try http://www.ti.com/product/TMS320C5515.

[ - ]
Reply by drmikeJanuary 17, 2017

Start here:

http://www.ti.com/general/docs/litabsmultiplefilel...

Use the phrases you find in that document to search the web for more documents.  This is not a simple task, so you won't be getting much sleep in the next 2 weeks.  But you will have a lot of fun!


Patience, persistence, truth,

Dr. mike

[ - ]
Reply by andrecr03January 17, 2017

Thank you very much, Dr. Mike!
I'll take a look!

[ - ]
Reply by MichaelRWJanuary 17, 2017

I think Dr. Mike's suggestion is the way to proceed.

[ - ]
Reply by rt45aylorJanuary 17, 2017

I personally am more comfortable trying things like this out in MATLAB since it's what I'm most comfortable with, but before converting them to C and working out the programming syntax, you might try recording your own voice saying a few different words like "red" and "green" but take a couple of recordings of each word and see if you can notice similarities and  differences between different recordings of the same word in both the time and frequency domains. Then see if you can apply some the knowledge from the useful links others have posted. MATLAB has a few tutorials on this but the general concepts all revolve around the idea of matched filtering, which sounds like it might be a foundation your professor is trying to instill within this project

[ - ]
Reply by Tim WescottJanuary 17, 2017

Be careful, in choosing what to do, to bite off as small a chunk as you can to get a grade.  Voice recognition is not trivial in the least.

Have you checked to see if TI has a voice recognition library available for your chip?  If they do, and you don't have to pay for it, use it.  Unless you've been sitting on this project for months, then getting a vendor-supplied software base into a board is about right for the time you have.

A second choice (assuming it's allowed by your prof) is to find someone else's example apps -- but your mileage will definitely vary; the experience of finding random code on the Internet and making it work is dreadful more often than not.

I'm not sure what to tell you if you're expected to roll your own.  A small vocabulary is easier (i.e., "red" and "green").  A limited number of speakers (like, say, just you) is easier.  Arbitrary, speaker-independent phrase recognition is freaking hard -- Google does it, but it took them a decade, and they probably had hundreds of people working on it.