You know the ones I’m talking about. Like the ones in Star Trek(NG)!!
I’ve been playing with some applications that use both text-to-speech output and speech recognition (via the Microsoft Speech Application Programming Interface). It’s far from flawless, but provided you’re content with recognition of pre-defined commands, this actually works tolerably well.
If you want to see the whole voice recog demo and not the one edited to make ms look as bad as possible, click this one.
I’m not an ms fan but I like to see the whole story.
I agree, but the longer one isn’t as funny.
I tried selling speech-recogition software once, a few years ago. There were still a lot of bugs in it. Even when those are resolved, it’s never going to substantially replace the screen-and- keyboard – because most literate people can read text much faster than it can be intelligibly spoken by human or machine, and even a moderately skilled typist can type faster than he/she can talk.
Sounds cool, but think how much more annoying banner and pop-up ads are going to become.
For about a year before 9/11 I worked as a programmer at Fonix Corp. where we made speech software. Most of the technology went over my head - trained neural nets, phoneme categorization, natural grammers, what have you - but it was still incredibly freaking cool and all I had to do was port the source code to the StrongARM processor.
Computer voice technology is more or less “there”. What’s lacking, rather, is the logic driving the voice technology - it’s currently no more sophisticated than a phone menu or the menus on a DVD. It really needs advances in artificial intelligence that we haven’t reached yet. The voice-driven application can’t adapt well to different (i.e. unplanned and untested) circumstances. Voice is inherently a one-dimensional data stream - how would you present hundreds of choices, and process the resulting input, for the user like you can in a screen & keyboard setup?
The result of all that is, I watched Fonix’s product - one of the most amazing and elegant pieces of technology I’ve ever seen - relegated solely to enhancing that which is the bane of all humanity: the phone menu. Bleck.
Judging by that Microsoft demo, voice recognition doesn’t seem to have advanced much from the last time I gave up on it, which was probably six years ago. I’m beginning to think that it is fundamentally harder than we realised, and it will be decades or centuries before we have computers that can understand speech. Which, with my predictive track record, is the cue for the announcement of GoogleSpeechbeta tomorrow.
“Shakes, you know I have the utmost enthusiasm about this mission. I think you should take a stress pill and get some rest…”
Unless I’m missing something, that wasn’t substantially different from the news story. He had a few preset commands that executed as they should and on his second try with the voice input it got most of what he said kind of right. It didn’t recognize one of the words he said. He glossed over it by not trying to go back and correct it which is, admittedly, what I’d do if I were in his position. Of course the news story focused on the screw up. The screw up overshadowed everything else he’d done with the voice recognition to that point and even his second try wasn’t problem-free.
I’m not impressed at all. I had a Macintosh running System 7 in 1995 that would allow you to give some basic commands using voice recognition. That was at about the same level as the start of his demonstration. System 9 had more things you could do since it was integrated better and there was scripting support for it. Ten years hasn’t made much of a difference on that end. Macs haven’t ever had voice input through the OS though, just recognition of preset commands. There is a product for the Macintosh called iListen that reportedly does a decent job of handling voice input.
From everything I’ve heard though, THE killer voice input app so far is Dragon Naturally Speaking, which is a Windows-only program. It boasts something like a 98-99% accuracy rate right out of the box. I’m not sure why Microsoft is pushing so hard on this voice recognition thing when it’s not ready for prime time. There’s a third-party solution that obviously outperforms anything Redmond can do. I think they’re trying to give the impression that they’re not playing catch-up with Leopard, but if so they’re failing pretty miserably for the most part. Virtually all of Vista’s innovative features have been dropped, were dropped around the time it became Vista instead of Longhorn, and what’s left is looking old, tired, and not well-implemented.
Their text reader voice was good enough that Steve Jobs compared it, fairly favorably, to the revamped voice in Leopard, though, and it may have been what gave Apple the impetus to improve on their previous reader voices. That’s what I like, really, healthy competition. Apple might start to suck too if they didn’t have someone to compete against in the software arena. There are plenty of hardware companies to keep them in line on that side, but their software innovation is still largely driven by their indirect rivalry with Microsoft.
I’d love it if Microsoft just started from ground zero and designed a totally new, non-legacy operating system using the best open standards out there and creating their own implementations where needed. If the design team was well-managed without being over-managed, had an overall vision and goal in mind that they stuck to, and concentrated on good design and robust implementations without useless flash and cruft, they could do something really cool. That OS would kick ass and keep kicking it for years, maybe even a decade or more. That would be an OS I’d consider buying.