Is this voice computer generated

Just curious…

Watching this YouTube channel it appears the overdub is computerized. But not sure why they would bother with it. On the other hand maybe it’s just being read. Opinion?

I get something weird during the “so few of their fossil remains …” part early on.

It sounds like “fossilly”. But there are multiple streaming glitches for me, so I’m not sure.

But if that is the actual sound, then there’s something very weird going on.

IMO yes, that is a computer generated voice.

There are various commercially available text-to-speech programs available.

Definitely computer generated. Not a shred of doubt.

Stephen Hawking was once considering upgrading his vocoder to something more like this, which is quite a bit more naturalistic, but he decided not to as it made him sound like a different person and his current voice is really his trademark.

Anyway, the giveaways for this track are the breathlessness, monotone, and unusual emphasis.

Yeah, not a human: 00:20 - “so few of their fossy leased remains have been discovered”

I’ve seen a lot of these videos - slideshow with zoom, synthesised voiceover. I’m guessing there’s an app out there somewhere that you just load up with text and images and it spits out a YouTube video to pad your view count.

I understand that on Youtube, a lot of people do that because they aren’t English native speakers.

It could be that, but given the relatively-content-free imagery of all of the videos on this channel (and others like it). I think this is probably part of a (failed) get-rich-quick scheme based on an automated video creator - you dump (possibly stolen) text and images in one end, and YouTube Clickbait videos come squirting out of the other end. You monetize them and sit back to enjoy the profits of no labour.

Except that nobody really wants to watch them, so it doesn’t actually make money.

I am very surprised to hear that this voice was computer-generated. Some of the stresses seem natural; if a computer is “smart” enough to produce those stresses — some of which may be context-dependent — then I am very impressed. Is it certain it was computer-generated?

The narration here is fairly fast-paced. My guess was that he had trouble keeping clear even diction while speaking quickly, and forced himself into an even-paced, and thus stilted, tone to keep up.

From the very first words it sounds very clearly and obviously computer generated to me.

Yes, TTS (text-to-speech) programs have gotten a lot better. I’m not sure what the state-of-the-art is right now, but even things like emotional expressiveness is being algorithmically programmed into TTS software. Here’s one example. And that’s from five years ago. They don’t sound realistically quite human yet to my ears, but they’re getting closer and closer.

No doubt in my mind. In fact, if it’s a human, then it’s quite a talented voice mimic to be able to impersonate a speech synthesiser. The intonation is clever, but there are algorithms for that, nowadays.

A human voiceover artist would not pronounce fossilized as FOSS-ee-leezed.

Listen to the catch between the letters D and A when he says DNA at 0:40. A human would not insert that break.

No doubt in my mind this is synth.