comp.dsp | Speech Synthesis| page 2

Reply by ●January 2, 20092009-01-02

On Jan 2, 5:00&#4294967295;pm, HardySpicer <gyansor...@gmail.com> wrote:
> On Jan 2, 7:12&#4294967295;pm, "steveu" <ste...@coppice.org> wrote:
>
>
>
> > >On Jan 1, 9:04=A0am, HardySpicer <gyansor...@gmail.com> wrote:
> > >> In the bad old days of LPC Speech synthesis, the best we could hope
> > >> for was a robotic sounding voice. Now however it would often be hard
> > >> to tell the difference between real speech and synthesised. I am
> > >> guessing they use real speech samples - would this be right? The
> > >> voices use huge amounts of disk space.
>
> > >> H.
>
> > >Well, AT & T are the best along with Cepstral voices. But this one is
> > >maybe the best of all
>
> > >http://www.cereproc.com/demo.html
>
> > I wonder which AT&T synthesiser you mean. There is more than one. The
> > version that went through various ownerships, and ended up as a Nuance
> > product seems the most widely used, and can sound pretty good. The cereproc
> > demo sounds rather unpleasant, although it has reasonable clarity. Did you
> > listen to the studio recording by mistake? They seems to include that to
> > fool the unwary. :-\
>
> > If you have a TTS voice which is 200MB to 300MB long, it will probably be
> > a concatenative synthesiser. These select "best fit" units of recorded
> > speech, apply pitch shifting, to get better emphasis, and blends the units
> > together. The result can sound very natural, but the clarity can be poor.
> > If the voice says something like an address, where context provides no help
> > in discriminating words, the effectiveness of these synthesisers can be
> > poor.
>
> > If you have a TTS voice which is less than 1M is will probably be a true
> > synthesiser, based on something like the old Klatt synthesiser. These all
> > seem to sound rather robotic, but can achieve great clarity. If the voice
> > says something like an address, where context provides no help in
> > discriminating words, these are generally the best.
>
> > The latest synthesisers, from people like Cepstral, seem to have voices in
> > the 10's of MB range. They appear to require far less studio recording that
> > the purely concatenative synthesizers. They seem to be use some hybrid
> > approaches.
>
> > Most of the commercial synthesisers can be traced back to the Speech
> > centre at Edinburgh University, and the Festival speech synthesizer they
> > produced. Cepstral and AT&T are amongst those. It looks like Cereproc may
> > be too.
>
> > Regards,
> > Steve
>
> I've tried them all and they are all pretty good compared with the old
> fashioned LPC robotic. Agreed that none are perfect yet, but we live
> in hope! I thought they used recorded speech - explains a lot. Size is
> not as much a matter as it once was 20 -30 years ago or more when all
> this stuff got going. I can imagine that personalities will be the
> next thing from actors etc and voices with attitude.
>
> H.

If you Google, you'll find quite a lot on adding emotion to TTS. I
don't know of a commercial product that adds such a feature, though.
Singing TTS is another fun research area.

Steve

Reply by HardySpicer ●January 2, 20092009-01-02

On Jan 3, 3:48&#4294967295;am, ste...@coppice.org wrote:
> On Jan 2, 5:00&#4294967295;pm, HardySpicer <gyansor...@gmail.com> wrote:
>
>
>
> > On Jan 2, 7:12&#4294967295;pm, "steveu" <ste...@coppice.org> wrote:
>
> > > >On Jan 1, 9:04=A0am, HardySpicer <gyansor...@gmail.com> wrote:
> > > >> In the bad old days of LPC Speech synthesis, the best we could hope
> > > >> for was a robotic sounding voice. Now however it would often be hard
> > > >> to tell the difference between real speech and synthesised. I am
> > > >> guessing they use real speech samples - would this be right? The
> > > >> voices use huge amounts of disk space.
>
> > > >> H.
>
> > > >Well, AT & T are the best along with Cepstral voices. But this one is
> > > >maybe the best of all
>
> > > >http://www.cereproc.com/demo.html
>
> > > I wonder which AT&T synthesiser you mean. There is more than one. The
> > > version that went through various ownerships, and ended up as a Nuance
> > > product seems the most widely used, and can sound pretty good. The cereproc
> > > demo sounds rather unpleasant, although it has reasonable clarity. Did you
> > > listen to the studio recording by mistake? They seems to include that to
> > > fool the unwary. :-\
>
> > > If you have a TTS voice which is 200MB to 300MB long, it will probably be
> > > a concatenative synthesiser. These select "best fit" units of recorded
> > > speech, apply pitch shifting, to get better emphasis, and blends the units
> > > together. The result can sound very natural, but the clarity can be poor.
> > > If the voice says something like an address, where context provides no help
> > > in discriminating words, the effectiveness of these synthesisers can be
> > > poor.
>
> > > If you have a TTS voice which is less than 1M is will probably be a true
> > > synthesiser, based on something like the old Klatt synthesiser. These all
> > > seem to sound rather robotic, but can achieve great clarity. If the voice
> > > says something like an address, where context provides no help in
> > > discriminating words, these are generally the best.
>
> > > The latest synthesisers, from people like Cepstral, seem to have voices in
> > > the 10's of MB range. They appear to require far less studio recording that
> > > the purely concatenative synthesizers. They seem to be use some hybrid
> > > approaches.
>
> > > Most of the commercial synthesisers can be traced back to the Speech
> > > centre at Edinburgh University, and the Festival speech synthesizer they
> > > produced. Cepstral and AT&T are amongst those. It looks like Cereproc may
> > > be too.
>
> > > Regards,
> > > Steve
>
> > I've tried them all and they are all pretty good compared with the old
> > fashioned LPC robotic. Agreed that none are perfect yet, but we live
> > in hope! I thought they used recorded speech - explains a lot. Size is
> > not as much a matter as it once was 20 -30 years ago or more when all
> > this stuff got going. I can imagine that personalities will be the
> > next thing from actors etc and voices with attitude.
>
> > H.
>
> If you Google, you'll find quite a lot on adding emotion to TTS. I
> don't know of a commercial product that adds such a feature, though.
> Singing TTS is another fun research area.
>
> Steve

I tried the singing with no success so far!

Reply by banton ●January 4, 20092009-01-04

For the speech synthesis backend "mbrola", there exist
several tts frontends including singing ones and frontends that
attemp to include emotions.  The quality of the results varies.
It's definitely not a wonder-weapon, but the system is really
fun to play with since you can enter the phonemes and the prosodic
information yourself.  Even without a frontend its fairly easy
to make it sing. 

http://tcts.fpms.ac.be/synthesis/mbrola.html

Reply by Jerry Avins ●January 5, 20092009-01-05

HardySpicer wrote:
> On Jan 1, 9:04 am, HardySpicer <gyansor...@gmail.com> wrote:
>> In the bad old days of LPC Speech synthesis, the best we could hope
>> for was a robotic sounding voice. Now however it would often be hard
>> to tell the difference between real speech and synthesised. I am
>> guessing they use real speech samples - would this be right? The
>> voices use huge amounts of disk space.
>>
>> H.
> 
> Well, AT & T are the best along with Cepstral voices. But this one is
> maybe the best of all
> 
> http://www.cereproc.com/demo.html
> 
> H

Although I recognized many words, I did not find that intelligible 
enough to get the gist of the paragraph. Compare it to Microsoft's 
text-to-speech or better yet, 
http://www.thescottishvoice.org.uk/Home/index.php

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by VelociChicken ●January 6, 20092009-01-06

> HardySpicer wrote:
>> On Jan 1, 9:04 am, HardySpicer <gyansor...@gmail.com> wrote:
>>> In the bad old days of LPC Speech synthesis, the best we could hope
>>> for was a robotic sounding voice. Now however it would often be hard
>>> to tell the difference between real speech and synthesised. I am
>>> guessing they use real speech samples - would this be right? The
>>> voices use huge amounts of disk space.
>>>
>>> H.
>>
>> Well, AT & T are the best along with Cepstral voices. But this one is
>> maybe the best of all
>>
>> http://www.cereproc.com/demo.html
>>
>> H
>
> Although I recognized many words, I did not find that intelligible enough 
> to get the gist of the paragraph. Compare it to Microsoft's text-to-speech 
> or better yet, http://www.thescottishvoice.org.uk/Home/index.php
>
> Jerry

That's a good Scottish voice indeed - but why can't they get rid of those 
sharp glitches? Is it just down to the hours spent on chopping the voice up 
and the expertise of the analyst? I can't help feeling that it can be made 
glitch free somehow...

Previous 12Next

Speech Synthesis

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group