text-to-speech engines - adding voices

dstarfire · May 30, 2014, 4:59pm

How hard would it be for a development studio to create a text to speech (TTS) engine that mimics a fictional character? What about making an engine that could do a variety of voices?* Assume you’ve got full cooperation from the voice actor who does the voice for that character.
*I know many gps devices with voice directions offer you a variety of voices, but I don’t know if that’s a bunch of different engines (which means changing to a different voice is a resource-intensive operation) or just different profiles for the same engine.

bob_2 · May 30, 2014, 5:26pm

I use Natural Reader and it does a pretty good job. If I want different voices I would have to pay for them.

engineer_comp_geek · May 30, 2014, 6:56pm

I’ve done a little bit of research into text to speech, mostly with the idea of incorporating different voices into some my own software. I didn’t get very far, because I wanted a variety of voices that were easily selectable/configurable, sounded reasonably close to a real human, and the software had to be free or cheap. I wasn’t able to find anything that fit those requirements.

What I found was you have the cheap and simple text to speech generators that are fairly easy to customize with new voices, but they all sound extremely robotic. On the other end of the scale, you have fairly realistic voices that apparently requires a huge amount of work to create new voices for.

For example, if you look at something like Festival Text To Speech you see that they took 2 hours of recorded speech to generate their “Alan” voice and 3 hours of recorded speech to generate their “Nina” voice. You also find that their documentation for the project, like many linux projects, is extremely horrible and lacking in detail. I get the impression though that there is a lot of computer processing and human labor that goes into those hours of recordings to create a usable TTS voice.

For a commercial project, I found this one using google:
https://www.cereproc.com/en/support/faqs/voicecreation

They have this to say:

They also claim that they can build custom TTS voices with as little as 40 minutes of recorded and transcribed speech, but they recommend 4 hours of speech for good quality results.

So, basically, it can be done, but it’s going to be expensive. You need at least 4 hours of your voice actor’s time (as well as the recording studio’s time), transcription services, and then the services of the TTS provider.

Topic		Replies	Views
I need a text-to-speech program that doesn't sound like a robot... Miscellaneous and Personal Stuff I Must Share	6	14813	September 7, 2007
I need to sound like I robot Miscellaneous and Personal Stuff I Must Share	8	976	December 1, 2005
How to create robotic voice samples for a videogame? In My Humble Opinion	6	1719	August 28, 2014
Merriam-Webster computer(?) speech Factual Questions	6	636	March 14, 2003
You Tube Narration Cafe Society	8	307	July 15, 2022

text-to-speech engines - adding voices

Related topics