On Jan 2, 5:00�pm, HardySpicer <gyansor...@gmail.com> wrote:> On Jan 2, 7:12�pm, "steveu" <ste...@coppice.org> wrote: > > > > > >On Jan 1, 9:04=A0am, HardySpicer <gyansor...@gmail.com> wrote: > > >> In the bad old days of LPC Speech synthesis, the best we could hope > > >> for was a robotic sounding voice. Now however it would often be hard > > >> to tell the difference between real speech and synthesised. I am > > >> guessing they use real speech samples - would this be right? The > > >> voices use huge amounts of disk space. > > > >> H. > > > >Well, AT & T are the best along with Cepstral voices. But this one is > > >maybe the best of all > > > >http://www.cereproc.com/demo.html > > > I wonder which AT&T synthesiser you mean. There is more than one. The > > version that went through various ownerships, and ended up as a Nuance > > product seems the most widely used, and can sound pretty good. The cereproc > > demo sounds rather unpleasant, although it has reasonable clarity. Did you > > listen to the studio recording by mistake? They seems to include that to > > fool the unwary. :-\ > > > If you have a TTS voice which is 200MB to 300MB long, it will probably be > > a concatenative synthesiser. These select "best fit" units of recorded > > speech, apply pitch shifting, to get better emphasis, and blends the units > > together. The result can sound very natural, but the clarity can be poor. > > If the voice says something like an address, where context provides no help > > in discriminating words, the effectiveness of these synthesisers can be > > poor. > > > If you have a TTS voice which is less than 1M is will probably be a true > > synthesiser, based on something like the old Klatt synthesiser. These all > > seem to sound rather robotic, but can achieve great clarity. If the voice > > says something like an address, where context provides no help in > > discriminating words, these are generally the best. > > > The latest synthesisers, from people like Cepstral, seem to have voices in > > the 10's of MB range. They appear to require far less studio recording that > > the purely concatenative synthesizers. They seem to be use some hybrid > > approaches. > > > Most of the commercial synthesisers can be traced back to the Speech > > centre at Edinburgh University, and the Festival speech synthesizer they > > produced. Cepstral and AT&T are amongst those. It looks like Cereproc may > > be too. > > > Regards, > > Steve > > I've tried them all and they are all pretty good compared with the old > fashioned LPC robotic. Agreed that none are perfect yet, but we live > in hope! I thought they used recorded speech - explains a lot. Size is > not as much a matter as it once was 20 -30 years ago or more when all > this stuff got going. I can imagine that personalities will be the > next thing from actors etc and voices with attitude. > > H.If you Google, you'll find quite a lot on adding emotion to TTS. I don't know of a commercial product that adds such a feature, though. Singing TTS is another fun research area. Steve
Speech Synthesis
Started by ●December 31, 2008
Reply by ●January 2, 20092009-01-02
Reply by ●January 2, 20092009-01-02
On Jan 3, 3:48�am, ste...@coppice.org wrote:> On Jan 2, 5:00�pm, HardySpicer <gyansor...@gmail.com> wrote: > > > > > On Jan 2, 7:12�pm, "steveu" <ste...@coppice.org> wrote: > > > > >On Jan 1, 9:04=A0am, HardySpicer <gyansor...@gmail.com> wrote: > > > >> In the bad old days of LPC Speech synthesis, the best we could hope > > > >> for was a robotic sounding voice. Now however it would often be hard > > > >> to tell the difference between real speech and synthesised. I am > > > >> guessing they use real speech samples - would this be right? The > > > >> voices use huge amounts of disk space. > > > > >> H. > > > > >Well, AT & T are the best along with Cepstral voices. But this one is > > > >maybe the best of all > > > > >http://www.cereproc.com/demo.html > > > > I wonder which AT&T synthesiser you mean. There is more than one. The > > > version that went through various ownerships, and ended up as a Nuance > > > product seems the most widely used, and can sound pretty good. The cereproc > > > demo sounds rather unpleasant, although it has reasonable clarity. Did you > > > listen to the studio recording by mistake? They seems to include that to > > > fool the unwary. :-\ > > > > If you have a TTS voice which is 200MB to 300MB long, it will probably be > > > a concatenative synthesiser. These select "best fit" units of recorded > > > speech, apply pitch shifting, to get better emphasis, and blends the units > > > together. The result can sound very natural, but the clarity can be poor. > > > If the voice says something like an address, where context provides no help > > > in discriminating words, the effectiveness of these synthesisers can be > > > poor. > > > > If you have a TTS voice which is less than 1M is will probably be a true > > > synthesiser, based on something like the old Klatt synthesiser. These all > > > seem to sound rather robotic, but can achieve great clarity. If the voice > > > says something like an address, where context provides no help in > > > discriminating words, these are generally the best. > > > > The latest synthesisers, from people like Cepstral, seem to have voices in > > > the 10's of MB range. They appear to require far less studio recording that > > > the purely concatenative synthesizers. They seem to be use some hybrid > > > approaches. > > > > Most of the commercial synthesisers can be traced back to the Speech > > > centre at Edinburgh University, and the Festival speech synthesizer they > > > produced. Cepstral and AT&T are amongst those. It looks like Cereproc may > > > be too. > > > > Regards, > > > Steve > > > I've tried them all and they are all pretty good compared with the old > > fashioned LPC robotic. Agreed that none are perfect yet, but we live > > in hope! I thought they used recorded speech - explains a lot. Size is > > not as much a matter as it once was 20 -30 years ago or more when all > > this stuff got going. I can imagine that personalities will be the > > next thing from actors etc and voices with attitude. > > > H. > > If you Google, you'll find quite a lot on adding emotion to TTS. I > don't know of a commercial product that adds such a feature, though. > Singing TTS is another fun research area. > > SteveI tried the singing with no success so far!
Reply by ●January 4, 20092009-01-04
For the speech synthesis backend "mbrola", there exist several tts frontends including singing ones and frontends that attemp to include emotions. The quality of the results varies. It's definitely not a wonder-weapon, but the system is really fun to play with since you can enter the phonemes and the prosodic information yourself. Even without a frontend its fairly easy to make it sing. http://tcts.fpms.ac.be/synthesis/mbrola.html
Reply by ●January 5, 20092009-01-05
HardySpicer wrote:> On Jan 1, 9:04 am, HardySpicer <gyansor...@gmail.com> wrote: >> In the bad old days of LPC Speech synthesis, the best we could hope >> for was a robotic sounding voice. Now however it would often be hard >> to tell the difference between real speech and synthesised. I am >> guessing they use real speech samples - would this be right? The >> voices use huge amounts of disk space. >> >> H. > > Well, AT & T are the best along with Cepstral voices. But this one is > maybe the best of all > > http://www.cereproc.com/demo.html > > HAlthough I recognized many words, I did not find that intelligible enough to get the gist of the paragraph. Compare it to Microsoft's text-to-speech or better yet, http://www.thescottishvoice.org.uk/Home/index.php Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●January 6, 20092009-01-06
> HardySpicer wrote: >> On Jan 1, 9:04 am, HardySpicer <gyansor...@gmail.com> wrote: >>> In the bad old days of LPC Speech synthesis, the best we could hope >>> for was a robotic sounding voice. Now however it would often be hard >>> to tell the difference between real speech and synthesised. I am >>> guessing they use real speech samples - would this be right? The >>> voices use huge amounts of disk space. >>> >>> H. >> >> Well, AT & T are the best along with Cepstral voices. But this one is >> maybe the best of all >> >> http://www.cereproc.com/demo.html >> >> H > > Although I recognized many words, I did not find that intelligible enough > to get the gist of the paragraph. Compare it to Microsoft's text-to-speech > or better yet, http://www.thescottishvoice.org.uk/Home/index.php > > JerryThat's a good Scottish voice indeed - but why can't they get rid of those sharp glitches? Is it just down to the hours spent on chopping the voice up and the expertise of the analyst? I can't help feeling that it can be made glitch free somehow...