All the science channels are flooded with commercials for something called Dragon, a voice recognition program. Every time I see a commercial for this I wonder about two things: Can the software really listen to and distinguish words as easily as the commercial shows and does natural speech translate well into a business memo or book report?
I ask because I once saw a demonstration of a voice recognition program (don’t remember if it was Dragon) and that thing could not understand a word I said. I’m from The Bronx and I talk fast. The program had a real problem with my natural speech patterns and the demonstrator had to instruct me (and others) to change the way we spoke into something not very natural at all.
Plus, I’ve seen people write business correspondence the same way that they speak and. . . it is not pretty. We don’t really write the same way we speak.
The demonstration I was at was many, many years ago so the programs could have improved significantly and perhaps I’m wrong about writing the same way we speak. What say you?
Normally voice recognition software makes you go through a “training phase” where you read a bunch of stuff and it saves a profile unique to your voice, so that in theory it can understand you better. The problem is that even within an individual there’s so much variation in the way you sound that it’s never going to be perfect (and don’t even start with homonyms!). It helps if you make a concerted effort to say everything in your clearest, most monotone voice possible. But that’s not an effort I particularly want to make - and even if that wasn’t necessary I still feel like speaking is more effort than manipulating a good old-fashioned mouse and keyboard. But maybe that’s just me.
Have a smartphone or ipad? Dragon has a app you can try it for yourself. No training is required and it works pretty well for me, depending on your voice it may or may not be good for you.
Just remember you need the app to ‘wreck a nice beach’
I used a voice recognition program probably 10 years ago (so I’m sure they’re a lot better than that). I have a reasonably neutral accent, but a tendency to slur my words. It was slower than typing, by far. But if you don’t type as fast, why not try it?
It’s really not about recording your natural speech, as you dictating to it instead of typing.
I have it on my Android phone. It works surprisingly well, with no training necessary. But then, I have a pretty neutral midwestern accent.
You are correct that we don’t talk the way we write, so if you are dictating business correspondence, you sort of have to “type with your mouth” rather than just talk naturally.
I can totally see people developing a new style of ‘writing’. Like leet speek. Oh! Might this be the death of that horrid method of writing? Bizarre overheard conversations could be the new Damn You AutoCorrect.
Even on my semi-conscious phone I had rudimentary voice recognition, (“call,” “play music,” etc.), although I haven’t been able to get one to work on an Android smartphone yet, simply because I haven’t found a good app. The one that came with the phone doesn’t do anything.
I used Dragon earlier this year when I did something nasty to the thumb of my right hand and couldn’t type for ~6 weeks (my boss was not going to allow me to get away with no typing!). For general stuff, like writing emails, summary documents, status reports, and the bulk of scientific papers, it was pretty good. It fell down (unsurprisingly) if I needed to code, or use specialized software, but otherwise it was pretty good.
Not at all. I saw a demonstration of some VR software at EPCOT Center one time and thought “This is the way of the future!” I mean, supposedly it could even tell if you meant to say “which” or “witch” based on context, among other amazing things. I went home and bought (I want to say Dragon’s “Generally Speaking”?) I spent the 5 HOURS or so prepping it during installation by repeating phrases so it would recognize the timbre and inflection of my voice, or whatever. I was literally hoarse by the time I was done.
Result? Even after all that everything I tried to speak came out as complete jibberish. If you had a legit reason to use it (ex: you are blind and cannot see to type and have the genuine desire to futz with it, and more power to you if that’s the case) it’s a total blowout.
You know how you call customer service and you get this cheerful recorded voice saying things like, “To make a payment, say, ‘payment’” and you say “payment,” and then it says, “I didn’t understand that. To make a payment, say, ‘payment.’ To request a copy of your most recent bill, say, ‘bill.’”
So you say “payment” again.
“I didn’t understand that. Hold while I transfer you to a live operator…If you’d like to make a call, hang up…”
That’s pretty much what it’s like. It understands the occasional word and fills in weird stuff when it doesn’t.
I knew a blind woman who used Dragon (Naturally Speaking, I think it was) and she had spent many hours training (I don’t know if she was training the software, or herself), and she liked it. There was some great editing feature where it read what she’d written back to her and she could edit it, question the spelling. Very cumbersome, I don’t know why she didn’t just learn to touch-type.
I had some kind of VR program on my computer some years ago, and I tried it. Every once in awhile I’d get a pristine paragraph that actually typed what I’d said, but mostly it looked like Dutch, and I type fast (lotsa typos, but there’s spell check) so I abandoned it. I was going to use it to transcribe interview tapes, but that really didn’t work as if it had ever learned my voice, that wouldn’t have helped with whoever I was interviewing.
I use the voice recognition software on my Android phone for text messaging all the time, and it usually works great, including rather complex messages. Once in a while it screws up badly because I slur words.
However I don’t write in a way that talking could handle, so I have no desire to use it for that.
I agree with this - as long as I speak slowly and distinctly and there’s not a lot of ambient noise, it works surprisingly well for text messaging. I also get voice mails on my cell phone translated into text messages. Since people are often speaking normally, they’re sometimes amusingly garbled although I can get the general gist.
You do have to train it, but it will learn your accent. You will, however, have to use very clear inflection; otherwise, it won’t understand you. It does work pretty well, though, once you get used to it and it gets used to you.
That said, it is wonderful assistive technology for people with dyslexia or impaired fine motor skills or some other problem that keeps them from typing with a keyboard.
One of my friends can’t use a keyboard with his hands, and uses voice recognition habitually for email, IM, etc. I was amazed, because the last time I tried a voice recognition software it was only so-so, but I’d seen emails and IM from him and had no idea: I didn’t notice any characteristic voice recognition mistakes (I don’t know how much you have to go back and correct, but he was IMing no slower than most other people). I assume he used one of the better products and did whatever training was necessary, and I assume there’s still a lot of mediocre voice recognition as well, but it shows that in only a few years, voice recognition has improved a lot.
I don’t know about Dragon, but on my iPhone, between Siri, AIM, and iChat, I’m amazed at how well it has been able to understand my spoken dictation, even to the point of putting in the proper punctuation when I speak its name. For example, I dictated the following sentence to my AIM app:
“I wonder if it will end the sentence with a period if I say period, period.”
I got:
“I wonder if it will end the sentence with a period if I say period.”
I was quite amused and impressed by this. I guess voice recognition is getting better.
I have Dragon NaturallySpeaking 11, and find that it works very well with as little as 1 hour of training. I suspect the people who have bad luck with it don’t have their microphones set up well. Go through their setup and play back the audio being recorded. If it is noisy at all, that is going to dramatically impact the accuracy of the recognition.
I guess my years of audio engineering came in handy. If I were to visit any of the people reporting dreadful results and check their audio setup and microphone usage, I’m sure I could get their accuracy up to to 99% as well.
The software just doesn’t record in the vocal range. Like any microphone, it picks up EVERY sound in the environment, and tries to assimilate it. For comparison, ask someone who uses a hearing aid regarding how it works for them - it just doesn’t amplify the stuff you WANT to hear, it amplifies EVERYTHING in the environment.
If you have, for example, a fan running in the background with a constant hum, or a loud computer fan, or are by a freeway, the program can’t isolate these sounds out. It CAN figure out that your voice should be the loudest thing in the area, so it tries to transcribe by the volume.
So, if its you, alone, with a quiet computer in a quiet room, and you speak like Eliza Doolittle, you should be able to get about 80% or so success. The more you deviate from that norm, the worse results you get.
When setting the software up, it specifically warns the user against over-enunciating words. I speak exactly as I normally do, and can even speak faster than my normal rate. I have a Midwestern accent, having grown up in Kansas City (the same accent as Johnny Carson and Walt Disney) with the vocabulary and diction of most computer geeks.
My main use of Dragon is to transcribe interviews by echoing or “parroting” what I’m hearing in a pair of headphones. I wind up speaking very quickly to keep up with the interview, and Dragon keeps up perfectly well. I have to review it to correct, but that is usually due to unintentionally paraphrasing what I heard.