DSPRelated.com
Forums

Pseudo voice

Started by Vladimir Vassilevsky October 14, 2011
Quite often, there is a need to test a communication path with a 
realistic signal. I am looking for a quick and dirty method to produce a 
voice-like signal from random numbers; so it could be integrated into a 
project as a small software module. Voice-like: having the statistics 
similar to that of "average" voice in time, amplitude and frequency domains.

Is there such thing invented already?

It is certainly possible to feed random numbers into a vocoder decoding 
part, or store and manipulate a set of pieces of recorded speech. It 
works, however that is pretty heavyweight solution.


Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
> produce a voice-like signal from random numbers
ITU-T G227 ( google will find it ) defines the analogue RLC formfilter for a speech signal.
> time, amplitude and frequency domains.
Time is the catch: speech switches in the simplest vocoder voiced / unvoiced. Filtered noise does not modulate signal strength.
> store and manipulate a set of pieces of recorded speech
In hardware: an old Votrax SC01 contains a simplified speech that is condensed to 64 phonemes/allophones. One still would have to balance the playback somewhat to the average voiced/unvoiced time of longterm statistics of speech. MfG JRD
I'd be tempted to do something like the following:

Start with AWGN lowpass-filtered to 3 KHz.

Apply a gate which randomly switches the signal on and off for intervals
in the 50 msec to 400 msec range.

Apply an independently-randomly-swept bandpass filter, varying between 
200 Hz and 1.5 KHz center frequency, sweeping around over intervals that
are also on average a few hundred milliseconds.

(What this won't give you is spectral fine structure; for that use
an impulse train instead of AWGN...)

Steve

Rafael Deliano wrote:

>> produce a voice-like signal from random numbers > > ITU-T G227 ( google will find it ) defines the analogue > RLC formfilter for a speech signal.
Speech consists of impulses. Spectral shaped noise is not good as a substitute for it.
>> time, amplitude and frequency domains. > > Time is the catch: speech switches in the simplest vocoder > voiced / unvoiced. Filtered noise does not modulate signal > strength.
A vocoder with a random input (with proper statistics of parameters) produces signal which looks and sounds very much like a real speech. But this solution is too complicated.
>> store and manipulate a set of pieces of recorded speech > > In hardware: an old Votrax SC01 contains a simplified speech > that is condensed to 64 phonemes/allophones. > One still would have to balance the playback somewhat to > the average voiced/unvoiced time of longterm statistics > of speech.
That is a heavy weight solution, too. VLV
>> quick and dirty method
how about re-using a speech synthesizer? It will need a couple of megabytes, but development effort is minimal. Quick and "dirty" with a capital "D"... for example: http://espeak.sourceforge.net/
On Fri, 14 Oct 2011 10:37:40 -0500, Vladimir Vassilevsky wrote:

> Quite often, there is a need to test a communication path with a > realistic signal. I am looking for a quick and dirty method to produce a > voice-like signal from random numbers; so it could be integrated into a > project as a small software module. Voice-like: having the statistics > similar to that of "average" voice in time, amplitude and frequency > domains. > > Is there such thing invented already? > > It is certainly possible to feed random numbers into a vocoder decoding > part, or store and manipulate a set of pieces of recorded speech. It > works, however that is pretty heavyweight solution.
Reading all the responses, and pondering a bit: How about feeding an impulse train to one or more selected filters. That's basically what a vocoder does, but rather than trying for anything resembling intelligibility go for simplicity while still retaining the essential nature of voiced speech. Use your random number generator to set the frequency and filter parameters. Possibly intersperse the above with blasts of noise as a simple expedient to simulate certain sounds stemming from 's', 'c', and 'z'. Dunno if it'll do what you want, but it'll do _something_. -- www.wescottdesign.com
On Oct 15, 4:37&#4294967295;am, Vladimir Vassilevsky <nos...@nowhere.com> wrote:
> Quite often, there is a need to test a communication path with a > realistic signal. I am looking for a quick and dirty method to produce a > voice-like signal from random numbers; so it could be integrated into a > project as a small software module. Voice-like: having the statistics > similar to that of "average" voice in time, amplitude and frequency domains. > > Is there such thing invented already? > > It is certainly possible to feed random numbers into a vocoder decoding > part, or store and manipulate a set of pieces of recorded speech. It > works, however that is pretty heavyweight solution. >
Student project no doubt.
HardySpicer wrote:
> On Oct 15, 4:37 am, Vladimir Vassilevsky<nos...@nowhere.com> wrote: >> Quite often, there is a need to test a communication path with a >> realistic signal. I am looking for a quick and dirty method to produce a >> voice-like signal from random numbers; so it could be integrated into a >> project as a small software module. Voice-like: having the statistics >> similar to that of "average" voice in time, amplitude and frequency domains. >> >> Is there such thing invented already? >> >> It is certainly possible to feed random numbers into a vocoder decoding >> part, or store and manipulate a set of pieces of recorded speech. It >> works, however that is pretty heavyweight solution. >> > Student project no doubt.
+9999 ;)
>> In hardware: an old Votrax SC01 > That is a heavy weight solution, too.
More ICs but simplified: http://www.embeddedforth.de/temp/resyn.pdf Its a resynthesizer for a simple V/UV/S speech recognition frontend. Features a LFSR as noise generator for unvoiced (UV), a DTMF-generator to generate a voiced "a" (V), two digipots to soft switch between both these and silence (S). Simplified "speech" will always end up in a custom made version tailored to the application. Depending on the application some features can be discarded, others have to be preserved. MfG JRD
On Oct 14, 1:36&#4294967295;pm, Vladimir Vassilevsky <nos...@nowhere.com> wrote:
> Rafael Deliano wrote: > >> produce a voice-like signal from random numbers > > > ITU-T G227 ( google will find it ) defines the analogue > > RLC formfilter for a speech signal. > > Speech consists of impulses. Spectral shaped noise is not good as a > substitute for it. > > VLV
Vlad, One of the G-Series from ITU had a pseudo-voice signal defined that mimiced speech, sounded as though you could almost understand it, and seemed to work well as an input to some of the new coding algorithms. I can't remember who the gentleman was that led that effort, but if you go to a company in Germany called Head Acoustics (http://www.head- acoustics.de/eng/), a gentleman by the name of Has Gierlich should be able to steer you to the document. Maurice Givens