How good is [Voice Recognition] software now?

My wife, who is deaf, wants to take some classes for the first time in over 25 years. Is there any voice recognition software that will enable her to transcribe lectures directly to an iPad without “training” the software first? I’m getting conflicting information on this. I’ve found several articles which suggest that the newer Dragon products can do it easy-peasy, but when I downloaded a trial product and installed it, the software still wanted me to go through the lengthy process that conditions it to recognize my voice. Is there any product that can instantly transcribe spoken English completely on the fly?

Completely? No. That would (probably) take actual understanding of the words, which we’re not close to. The dictation system built into the Mac can dictated without training, but only for short bits at a time (maybe 45 seconds to a minute). The one built into Office on Windows is “ok” out of the box, although it allows training to improve it. Without training, these will get maybe 80-90 percent of words right for most speakers. The higher-end (trained) ones can get 95-99 percent.

Unfortunately, none of these are likely to do well in a room full of other people, even if they’re not speaking – too much ambient noise and sound. And if the class is on something in a jargon field (technology, medicine, sciences in general), it’s likely to make a mash of the jargon.

If you want to try this, probably better to back it up with a real digital recorder – you can try to re-dictate from it later. If she can lip read, consider something like the Olympus LS-20M – it’s an “interviewer’s” recorder – much higher quality than those cheap things, stereo recording, and can capture 1080P video of the speaker at the same time. They run around $150-$200, I think (I bought mine a long time ago when they were more expensive). A high quality sound source also gives you better data if you want to try and run a filter to get rid of ambient noise before attempting recognition again.

You might want to poke a mod to change your title, BTW – “VR” usually means “Virtual Reality,” not “Voice Recognition.”

ETA: I just re-read and saw “iPad”: my understanding is that the iPad Speech Rec. versions are a little worse than the laptop one, because of the lower computation power of the chips in them.

:smack:

Some colleges have policies that provide free translators or signers for hearing impaired students or failing that some transcription of the spoken lecture. Do these options exist at her college?

My understanding is that iOS voice recognition, which is used in Siri, doesn’t use the device’s own CPU but rather sends the audio file to Apple’s servers over an internet connection for processing. Need an internet connection for it to work. This always seemed strange to me, and maybe what i read is not correct.

Moderator Action

VR changed to Voice Recognition in title.

Oh yes, and she fully intends to milk them for all they’re worth. We’re just evaluating backup options here. She is studying to be a mortician, so it might be hard to find a transcriber who won’t completely freak out in the embalming labs, for example.

Yep, it does, at least for Siri’s “intelligent” answers. I thought the dictation part (no “smarts,” just transcription) was on-device, but it appears not to be. I just tried it on an iPhone and an iPad in Airplane Mode, and neither accepts taps on the microphone.

This isn’t likely to be much of a problem in a modern classroom, I suspect, but it’s something to keep in mind. Dragon, specifically, does NOT need an internet connection to do it’s work.

If voice recognition was up to the quality that the OP hopes, I’d think that tv news stations would be using it. But I have the closed captioning on, and the errors are surprising.

Fwiw I took notes for a deaf student in a class in college. I just happened to be in the class and raised my hand when they asked for a volunteer. I took my own notes and photocopied them for her every day. She had a translator too from the school and then, I presume, volunteer note takers in each class instead of a dedicated note taker.

I work for a captioning phone company, and we use (presumably the latest) dragon software. I agree with TimeWinder–Dragon captions 80-95% correctly, depending on whether or not you are dropping small words (such as “but” in place of “of”) from your accuracy count.

However, in order to caption this way, you would need the speaker to speak directly into your VR microphone, and with a very specific trained kind of speech (each word needs to be pronounced independently, clearly, and with little inflection). You would probably need a third party to do this for you, using a recording.

There is a Captel phone service (which I work for), where you buy a phone with a built-in screen that provides captions via a third party (me) hearing the conversation and using Dragon VR software to caption calls live. In order to use Captel you’d have to record the lecture with very good recording software and be able to call your Captel phone with the recording, via a computer phone. I don’t know much about that and it sounds like it would be a lot of work and may not appear in a convenient format to use for notes, because the entire conversation would be stored into the phone and I’m not sure if it can be saved onto a disk. You would have to speak to a representative to see if it could be done, but the recording would have to be very clear, otherwise you’d end up with a lot of (speaker unclear) in your captions.

If there were better software, it would seem strange that my company wouldn’t use that directly and save money on training captionists as vigorously as they do.

If I were you, I’d use the interpreters as well as recording the audio of the lecture

Is there a reason that she doesn’t want to go through the training? We use such software in an environment where training isn’t possible. In fact, the speakers don’t even know they’re being transcribed, so the output is iffy at best, as they are just having a normal conversation.

With training, modern software can do a stellar job, and it doesn’t really take that long to do the initial training, so I’m just curious as to why it’s not an option.

I used an older version of Dragon in the past and it worked ok . I haven’t used the new one but, I assume it should be better than the one I used (I hate their long-ass commercials though).

If I understand correctly, she wants to record the person giving the lecture, not herself. Training here would mean getting the teacher to do the training session. Assuming there is only one teacher involved. So I think her use case matches what you say you’ve used, and probably won’t work too well.

Same process used by every Android-based program I’ve found so far - all of them use Google’s voice-recognition service. Pretty silly, since that’s already built-in :smack:, the apps are just giving you a different GUI for a system function.

We are talking about speech recognition, not voice recognition.

I have been using a dictation system based on the Dragon speech engine for years. It is probably 90% accurate, which is not even close to being good. Imagine reading a book with two or three errors on each page. Stupid errors. Also, they mainly err with the small words, so not infrequently the meaning is the exact opposite of what I said. And don’t even ask about numbers.

Drag one es abosutleyly fioabvuosl, noer, le ti me aitio youe.

I am talking about “dictation software”, the kind that you talk to it and it writes; I’m not familiar enough with the field in English to know what’s the difference between “speech recognition” and “voice recognition”.

The kinds that you talk to them and they answer questions, translate or perform system actions all have the same unfortunate tendency to need net access, anyway.

I use Dragon; this is transcribed by it. Bear in mind that this standard piece of software has to be able to cope with everything from an Alabama drawl to a Scottish burr, and all accents in between.

The initial training takes half-an-hour or so and that gets you up and running. It helps to allow it to scan your document file to collect your regular phrases, words not in its dictionary and syntax. Once running it does take a while to get fully up to speed. Each time you use it, it learns more, and after a while it can be pretty accurate.

It’s far from perfect, but beats typing two-fingered, hands down.

Add me to the crowd. We get this question a lot from my users, who don’t type well, and are looking for easy ways to do things. It all boils down to:

  1. You need a quiet room to get the best results.
  2. You need a high-quality microphone, preferrably a “shielded” one that focuses sound from your mouth area.
  3. You have to train the software to get used to you.

In my case, my users work in construction, on jobsites, in trailers. Quiet isn’t an option. They usually try it for two or three days, but quickly get frustrated with (A) their voices not being recorded due to equipment noise in the area, (B) transcription errors, and © having to go back to make corrections.

We have two users that love it, but they go back to their private offices to transcribe their notes rather than trying to do it on location.