0:15 “displacing large amounts of WAter. This displacement of water quickly generated a towering WAave.”
0:48 “pulling some structures several hundred feet into the Ocean. Yet, partially due to sheer luck, there were no injuries or faTAlities.”
It isn’t just a matter of accenting syllables. The pattern of pitch of the syllables is also part of it. There’s another (history? political?) channel that does/did this, but mercifully I’ve forgotten the name because I clicked on “do not recommend this channel” long ago. I’m doing the same with this channel.
Is there a word for this pattern? The only thing I can think of is “singsong”, but that doesn’t seem quite right.
I’m not sure about the pattern, but his pronunciation sounds weird to me, like someone who’s accustomed to slurring his words trying and failing to enunciate clearly.
ETA: I was thinking it sounded oddly robotic! That would explain it.
GeologyHub isn’t using text to speech; that’s just the manner of speaking for the presenter, made very occasionally weirder by the audio edit.
When editing spoken audio for stumbles and mistakes, a common trick is to join the successful first half of one take with the successful second half of another take; I use this a lot in my own videos; sometimes the joins are seamless and un-noticeable, sometimes there’s a weird little discontinuity in the intonation, pace, pitch or something. I get a lot of people saying I am using TTS.
The voice in this clip may be robotic but there are people who speak this way. It’s an affected style adopted by some people, particularly for voiceovers.
I mean, the guy has been making videos for 6 years; the recording quality has improved a great deal since the early videos, but it’s the same person narrating across the whole of that period. TTS programs did not sound that good 6 years ago.
To me it sounds like a human who speaks in a monotone by nature, and is trying to add some modulation to their speech, like a news anchor or documentary narrator voice, but is not very good at it.
This pattern is so quirky that I don’t think there’s a name for it. There aren’t a lot of names for speech patterns (“uptalk” is the only one I can think of). Aside from the intonation, this person’s enunciation is not very good. Sounds like he had a couple of drinks first.
If you’re performing for broadcast, you don’t underline every third word for emphasis because it sounds really unnatural. What you want to do is you want to talk the way people normally talk.
Obviously I was joking. It was for the opposite reason - that Turing assumed that it would be too easy to distinguish between human an AI if the AI also faced the challenge of understanding and generating speech.
You’ve heard of “Valley Girl speak,” aka “upspeak” where the speaker goes up at the end of every sentence, making everything sound like a question. If you listen a couple of times, you notice the intonation goes down at the end of every sentence. Not surprisingly, it’s called “downspeak.” My WAG is that the TTS robot sees the period at the end of the sentence and automatically shifts it to downspeak.
Of course, upspeak and downspeak really only work when the speaker is finished speaking, not after every sentence.
It sounds similar to some British accents, but used in this mechanical manner it will stand out as artificial. The building I used to work in as seldomly as possible had a parking payment machine that sounded like Stephen Hawking was trapped inside. That was more appealing to me than the attempts at realistic voice synthesis. But the issue has been storage of the breadth of voice samples needed for natural sounding speech and those limits have largely disappeared and we’re going to have a harder time detecting voice synthesis in high end applications in the future.