VoiceRecognition

MeatballWiki | RecentChanges | Random Page | Indices | Categories

Computers hearing a voice and understanding the words. Some has been said about this under HandwritingRecognition. The general case is "machine recognition of human communication" but that would look absolutely dreadful wiki'd.

Slashdot had a [link] to a story about a researcher named Shneiderman who says we'll never use a primarily voice-based computing system because we think and talk with the same parts of the brain.

There are plenty of places where voice-based computing makes sense. Computers are coming into cars, and while it may take the same parts of the brain to think and talk, it doesn't take the same parts to talk and drive. However, there are issues on how to add techology, and what kinds of technology can you add, before you start people driving into trees. Seeing as we've done ClothingAndTechnology (now at CarryingGadgets), CarsAndTechnology? would probably be a good place to go next.

VoiceRecognition is more properly "a computer receiving a digitized sound signal and identifying the words which were spoken". Understanding the words is a whole extra kettle of fish, ranging from simple word commands, to natural language interface, and up to ArtificialIntelligence.

VoiceRecognition for day long computer operation won't be happening any time soon, but not because of technology reasons ... you try talking aloud for 8 hours and see how long your voice lasts. Even once you've trained your voice and throat to sustain that torture, the guy in the cubicle next door will probably want to kill you ... kill him first, he has a really whiney voice. Ah, peace and quiet, time to get some work done ... uh oh, speaking out loud is not as fast as 80-120 wpm typing.

: Both the "going hoarse" and "thinking and speaking" arguments are compelling to me, but there are places where it makes sense. Computing while driving, for one. I'd say a headphone-and-mic PDA as another choice, but if Sunir's vanity won't let him put a handheld on his belt, walking around talking to yourself doesn't make much sense, either. I'm guessing there'll be a linguistic shorthand that we use to do a lot of the interface, much like we use faux-HandwritingRecognition to write on our iPaqs and PalmOS devices. The interesting bit is that we're moving from replacement at the letter level to the word level. Graffiti only needs to recognize A-Za-z0-9 plus selected dingbats, but the vocal equivalent (let's call it Babble, until someone makes it and releases it and gives their own name) would have to understand the words and put them together. This, of course, would have to be highly context-reliant. The utterances you use to talk to your car would be different than the ones you use in your house, with the possible exception of "Lights", but I am very fond of having that one with a hard switch under the driver's control, myself.

: A flash: this stuff ties in with PervasiveComputing, and while we might accept having our smart devices to talk to, the work of traditional computing would, except when translated to PervasiveComputing platforms in a way that makes sense for those platforms, be done the way they are now. The thought of composing email while driving fills me with fear. "Reading" email, however, wouldn't be so bad. --DaveJacoby

is it true that typing is faster than speaking? i just did a little experiment with the paragraph Both the "going hoarse"....under the driver's control, myself., above. Speaking it fast took 50 seconds, and typing it fast (without correcting mistakes, and skipping some punctuation) took 2:45, more than 3 times as long. I don't remember my exact typing speed, but it is more than 60 wpm. So for my rate of speech, at least, speaking would be faster even if i typed at 120 wpm. Perhaps the claim that "typing is faster than speaking" is talking about a speaking at a normal rate, rather the fastest speaking that can be easily understood? In this case i would think that it will be a small hurdle (couple of years, maybe) to get computers to recognize fast speaking almost as well as slow speaking (once they are doing good on slow speaking). -- BayleShanks

Anecdotally, voice recognition must have crossed some threshold recently; I keep running across telephone voice-response systems that want me to talk to them rather than pressing buttons on the phone. At first it was "press, or say, 'one'", but now it's just "when I say the option that you want, say 'yes'" or even "Now, what can I help you with?". (They're incredibly annoying, both because they hear me wrong a good fraction of the time, and because I feel like a complete idiot to be talking into a phone when there's no one at the other end. Sheesh!) -- DavidChess

|CategoryInterfaceDesign|

VoiceRecognition

Discussion