Speech modulation using LPC

Started by staseer October 30, 2009
Hi,

I am trying to produce a speech based signal from arbitrary values by
using LPC technique. My goal is to convert any data into speech like
signal, send it to remote end and then reconstruct the data from the speech
signal. I am basing my ideas from a paper. 

I am new to DSP but I am trying my best to grasp it.
Data -> LPC Decoder -> TX speech like signal  -> Rx speech like signal ->
LPC encoder -> Data
I have read about LPC and the rationals behind it. To start with I just
ran simple LPC decoder by using already stored coefficients and residue,
and got the same data back. The problem came when I tried to reconstrcut
the residual signal. With simple white noise passing thru the coefficients
didnot give the original coefficients back.

I thought I will use 10 bits for index to codebook for coefficients, 4
bits for pitch and 2 for gain. I have to use a periodic sin wave for
synthesizer input to avoid triggering VAD module. 
I was thinking of using 16 different sine wave frequencies for
reconstruction of pitch and gain and then use a table of 1024 coefficients
table to modulate it. At the Rx side I can pass the rxed signal thru LPC to
get the coefficients and map them to get the index. I can take fft to get
the pitch back, (can anyone suggest a better idea?), and gain can be
calculated by the amplitude of sin wave. 

My question is does this seem logical in terms of DSP and LPC. I have seen
that the waveform given to synthesizer plays a vital role in the
reconstruction, but in my case I can manipulate them both. Also I have read
that LPC coefficients can get unstable, would it be a problem in my case?
for I willhave only predefined coefficients? Also what is the best way to
construct a codebook(esp in my case) for LPC coefficients.

Thanks and regards,
Saba

staseer wrote:

> I am trying to produce a speech based signal from arbitrary values by > using LPC technique. My goal is to convert any data into speech like > signal, send it to remote end and then reconstruct the data from the speech > signal. I am basing my ideas from a paper. > > I am new to DSP but I am trying my best to grasp it. > Data -> LPC Decoder -> TX speech like signal -> Rx speech like signal -> > LPC encoder -> Data
So, you are trying to push data through vocoder path of a cellphone, right? The problem is you can't synchronize your data codec to the frame rate of the vocoder in between. So, the direct codebook to codebook mapping is not going to work. You have to create an artificial "parameters" on the transmit side and extract them on the receive side. The parameter change rate should be below Nyquist, i.e. half of the frame rate of the vocoder for spectral envelope.
> I have read about LPC and the rationals behind it. To start with I just > ran simple LPC decoder by using already stored coefficients and residue, > and got the same data back. The problem came when I tried to reconstrcut > the residual signal. With simple white noise passing thru the coefficients > didnot give the original coefficients back.
Yes, this is what expected. If you need to get something quick and dirty that simply works via vocoder path, run DTMF or PSK or FSK at 300 bps.
> I thought I will use 10 bits for index to codebook for coefficients, 4 > bits for pitch and 2 for gain. I have to use a periodic sin wave for > synthesizer input to avoid triggering VAD module.
VAD will trigger at sine wave. You need dynamically changing spectrum.
> I was thinking of using 16 different sine wave frequencies for > reconstruction of pitch and gain and then use a table of 1024 coefficients > table to modulate it. At the Rx side I can pass the rxed signal thru LPC to > get the coefficients and map them to get the index. I can take fft to get > the pitch back, (can anyone suggest a better idea?), and gain can be > calculated by the amplitude of sin wave. > > My question is does this seem logical in terms of DSP and LPC. I have seen > that the waveform given to synthesizer plays a vital role in the > reconstruction, but in my case I can manipulate them both. Also I have read > that LPC coefficients can get unstable, would it be a problem in my case? > for I willhave only predefined coefficients? Also what is the best way to > construct a codebook(esp in my case) for LPC coefficients.
First problem is that you are trying to tackle the wrong problem in the wrong way :) The vocoder path reconstructs the pitch and the spectral envelope of the signal. So you have two parameters (and gain) to play with. The pitch update rate is typically ~5ms, whereas the envelope is updated every ~20ms. Set the data speed accordingly. To make the things more complicated, the pitch, gain and spectral envelope are related, and the vocoder operates differentialy, i.e. encodes the difference between the parameters of the current and the previous frame. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
>staseer wrote: > >> I am trying to produce a speech based signal from arbitrary values by >> using LPC technique. My goal is to convert any data into speech like >> signal, send it to remote end and then reconstruct the data from the
speech
>> signal. I am basing my ideas from a paper. >> >> I am new to DSP but I am trying my best to grasp it. >> Data -> LPC Decoder -> TX speech like signal -> Rx speech like signal
->
>> LPC encoder -> Data > >So, you are trying to push data through vocoder path of a cellphone, >right? The problem is you can't synchronize your data codec to the frame
>rate of the vocoder in between. So, the direct codebook to codebook >mapping is not going to work. You have to create an artificial >"parameters" on the transmit side and extract them on the receive side. >The parameter change rate should be below Nyquist, i.e. half of the >frame rate of the vocoder for spectral envelope. > >> I have read about LPC and the rationals behind it. To start with I
just
>> ran simple LPC decoder by using already stored coefficients and
residue,
>> and got the same data back. The problem came when I tried to
reconstrcut
>> the residual signal. With simple white noise passing thru the
coefficients
>> didnot give the original coefficients back. > >Yes, this is what expected. >If you need to get something quick and dirty that simply works via >vocoder path, run DTMF or PSK or FSK at 300 bps.
DTMF goes through reliably, as the cell phone systems carry that as data, but how will FSK work at 300bps work? The symbol time for 2 level FSK is 3.3ms, but the sub-frame time for most cell phone codecs is 5ms, and you can't easily sync the data timing to it. That doesn't seem like a good match.
> >> I thought I will use 10 bits for index to codebook for coefficients, 4 >> bits for pitch and 2 for gain. I have to use a periodic sin wave for >> synthesizer input to avoid triggering VAD module. > >VAD will trigger at sine wave. You need dynamically changing spectrum. > >> I was thinking of using 16 different sine wave frequencies for >> reconstruction of pitch and gain and then use a table of 1024
coefficients
>> table to modulate it. At the Rx side I can pass the rxed signal thru
LPC to
>> get the coefficients and map them to get the index. I can take fft to
get
>> the pitch back, (can anyone suggest a better idea?), and gain can be >> calculated by the amplitude of sin wave. >> >> My question is does this seem logical in terms of DSP and LPC. I have
seen
>> that the waveform given to synthesizer plays a vital role in the >> reconstruction, but in my case I can manipulate them both. Also I have
read
>> that LPC coefficients can get unstable, would it be a problem in my
case?
>> for I willhave only predefined coefficients? Also what is the best way
to
>> construct a codebook(esp in my case) for LPC coefficients. > >First problem is that you are trying to tackle the wrong problem in the >wrong way :) > >The vocoder path reconstructs the pitch and the spectral envelope of the
>signal. So you have two parameters (and gain) to play with. The pitch >update rate is typically ~5ms, whereas the envelope is updated every >~20ms. Set the data speed accordingly. To make the things more >complicated, the pitch, gain and spectral envelope are related, and the >vocoder operates differentialy, i.e. encodes the difference between the >parameters of the current and the previous frame.
Steve

steveu wrote:


>>If you need to get something quick and dirty that simply works via >>vocoder path, run DTMF or PSK or FSK at 300 bps. > > > DTMF goes through reliably, as the cell phone systems carry that as data
DTMF can go through vocoder. It gets distorted, however it still can be decoded well.
> but how will FSK work at 300bps work? The symbol time for 2 level FSK is > 3.3ms, but the sub-frame time for most cell phone codecs is 5ms, and you > can't easily sync the data timing to it. That doesn't seem like a good > match.
The fixed codebook provides no less then 4 pulses per subframe. This is sufficient to digitize slow speed PSK or FSK. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
> > >steveu wrote: > > >>>If you need to get something quick and dirty that simply works via >>>vocoder path, run DTMF or PSK or FSK at 300 bps. >> >> >> DTMF goes through reliably, as the cell phone systems carry that as
data
> >DTMF can go through vocoder. It gets distorted, however it still can be >decoded well.
Handset to network the DTMF is always turned into a digital signal, and its timing is played around with by the system. Network to handset the DTMF audio passes through the channel (at least in most cases. I don't know if a translation to a digital signal is possible in the GSM protocols). Most lowish bit rate codecs can carry DTMF pretty well, when tested in isolation. The error rates are higher than without the codec in the way, but not too bad. However, when DTMF is sent from the network to a handset the results are terrible. I've never found the time to investigate why that is.
>> but how will FSK work at 300bps work? The symbol time for 2 level FSK
is
>> 3.3ms, but the sub-frame time for most cell phone codecs is 5ms, and
you
>> can't easily sync the data timing to it. That doesn't seem like a good >> match. > >The fixed codebook provides no less then 4 pulses per subframe. This is > sufficient to digitize slow speed PSK or FSK.
The theory sounds reasonable. My practical experience with trying V.21 across a GSM EFR network was its totally unusable. Again, I've never had the time to investigate what is really going on. Regards, Steve

steveu wrote:
>> >>steveu wrote: >> >> >> >>>>If you need to get something quick and dirty that simply works via >>>>vocoder path, run DTMF or PSK or FSK at 300 bps. >>> >>> >>>DTMF goes through reliably, as the cell phone systems carry that as > > data > >>DTMF can go through vocoder. It gets distorted, however it still can be >>decoded well. > > > Handset to network the DTMF is always turned into a digital signal, and > its timing is played around with by the system. Network to handset the DTMF > audio passes through the channel (at least in most cases. I don't know if a > translation to a digital signal is possible in the GSM protocols). Most > lowish bit rate codecs can carry DTMF pretty well, when tested in > isolation. The error rates are higher than without the codec in the way, > but not too bad. However, when DTMF is sent from the network to a handset > the results are terrible. I've never found the time to investigate why that > is.
The tones look like bandpass noise, and there could be severe twist. Filter type DTMF decoder work, frequency counter types fail.
>>>but how will FSK work at 300bps work? The symbol time for 2 level FSK > > is > >>>3.3ms, but the sub-frame time for most cell phone codecs is 5ms, and > > you > >>>can't easily sync the data timing to it. That doesn't seem like a good >>>match. >> >>The fixed codebook provides no less then 4 pulses per subframe. This is >> sufficient to digitize slow speed PSK or FSK. > > > The theory sounds reasonable. My practical experience with trying V.21 > across a GSM EFR network was its totally unusable. Again, I've never had > the time to investigate what is really going on.
The modem carrier should be low; few hundred Hz. I used to know a company which provided data solutions over cellular. That was before GPRS, SMS and cellular modems. There was no problem with analog systems, however they had to make it work over DAMPS and GSM. A small microcontroller did that. Anyway, pushing data stream through vocoder path would be an interesting exercise in DSP, however I don't know of any useful application of that. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com

Vladimir Vassilevsky wrote:


> Anyway, pushing data stream through vocoder path would be an interesting > exercise in DSP, however I don't know of any useful application of that.
Speaking of weird DSP ideas: I remember someone trying to invent an efficient method to use paper as a media for digital data storage. So the encoded data is printed on paper and then could be scanned back. The reasoning was it supposed to be compact, cheap and more reliable then floppies and CD-Rs. VLV
On Mon, 02 Nov 2009 10:46:00 -0600, Vladimir Vassilevsky
<nospam@nowhere.com> wrote:

> > >Vladimir Vassilevsky wrote: > > >> Anyway, pushing data stream through vocoder path would be an interesting >> exercise in DSP, however I don't know of any useful application of that. > >Speaking of weird DSP ideas: I remember someone trying to invent an >efficient method to use paper as a media for digital data storage. So >the encoded data is printed on paper and then could be scanned back. The >reasoning was it supposed to be compact, cheap and more reliable then >floppies and CD-Rs.
Absolutely. Back in the days of audio cassettes, Byte Magazine had listings in a barcode-like format called PaperByte. Example (scanned) http://primepuzzle.com/waduzitdo/waduzitdo.html (javascript req'd). -- Rich Webb Norfolk, VA

Rich Webb wrote:

> On Mon, 02 Nov 2009 10:46:00 -0600, Vladimir Vassilevsky > <nospam@nowhere.com> wrote:
>>Speaking of weird DSP ideas: I remember someone trying to invent an >>efficient method to use paper as a media for digital data storage. So >>the encoded data is printed on paper and then could be scanned back. The >>reasoning was it supposed to be compact, cheap and more reliable then >>floppies and CD-Rs. > > Absolutely. Back in the days of audio cassettes, Byte Magazine had > listings in a barcode-like format called PaperByte. Example (scanned) > http://primepuzzle.com/waduzitdo/waduzitdo.html (javascript req'd).
With some signal processing, one could probably achieve data density up to megabyte per page or so. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
>steveu wrote: >>> >>>steveu wrote: >>> >>> >>> >>>>>If you need to get something quick and dirty that simply works via >>>>>vocoder path, run DTMF or PSK or FSK at 300 bps. >>>> >>>> >>>>DTMF goes through reliably, as the cell phone systems carry that as >> >> data >> >>>DTMF can go through vocoder. It gets distorted, however it still can be
>>>decoded well. >> >> >> Handset to network the DTMF is always turned into a digital signal,
and
>> its timing is played around with by the system. Network to handset the
DTMF
>> audio passes through the channel (at least in most cases. I don't know
if a
>> translation to a digital signal is possible in the GSM protocols).
Most
>> lowish bit rate codecs can carry DTMF pretty well, when tested in >> isolation. The error rates are higher than without the codec in the
way,
>> but not too bad. However, when DTMF is sent from the network to a
handset
>> the results are terrible. I've never found the time to investigate why
that
>> is. > >The tones look like bandpass noise, and there could be severe twist. >Filter type DTMF decoder work, frequency counter types fail.
I think you missed my point. I strung together a DTMF generator, GSM FR or EFR codec, and a DTMF detector in a lab setup and the results were OK. I took the same generator and detector, and tried them across a clean GSM link. The results were completely unable. I'm pretty sure the error rate on the link must have been low during my testing. I never had the time to go back and figure out the cause. It might be weird stuff happening in echo cancellers (some of which really screw up DTMF, for some strange and debilitating reason). Steve