MeatballWiki | RecentChanges | Random Page | Indices | Categories

A difficult problem. Part of the ArtificialIntelligence researcher's set of difficult problems. We no longer have the Newton or the CrossPad? because it is so hard to do these things right.

(Well, that's not quite true. CrossPad? used a ConnectorConspiracy?-lead dongle to connect the unit to the PC, and that dongle failed on me. The fact that you couldn't buy pieces sort of soured me on it. It wouldn't be the first time a badly-supported implementation soured the public on a technology.)

But anyway, the Newton did real HandwritingRecognition, and there was a training period when the unit learned your handwriting. And anyone trying to use your Newton wrote gibberish because it was trained for you. PalmOS does false HandwritingRecognition, where you, the user, get trained to write in Graffiti. That's what made PalmOS take off.

Could this be the real answer? We keep trying to use "real world" metaphors, which are rapidly becoming outdated. I access files on a computer much more than files in a file folder. But we still use the latter for icons. For me, a desktop is a place where you put the computer. Why should I have a "desktop metaphor" for my computer's UI?

I can't read my own handwriting much of the time. Why should a computer? I don't use handwriting for anything of substance anymore - why would I want to use it to talk to a computer? Much more sensible to learn something the computer understands easily.

It works very well, except for when (as I mentioned) when the for-machines handwriting skips into for-people uses. I have done it. (I've been using PalmOS PalmTops for about a year and a half.) It does work very well. I'm not sure that we want to do VoiceRecognition this way, but then again, maybe we have to.


Speech recognition for dictation could be quite useful, but making a computer understand free-form spoken English (or any other human language) and correctly interpret it as commands doesn't seem like a good idea. At the very least you will end up heavily overloading parts of the vocabulary with special meanings. People have to learn the new interface no matter what. I don't know many people who think "file->open" when they pull out a piece of paper and pick up a pencil. Likewise, "file->save" doesn't map too well when you fold that paper up and put it in your back pocket.

Computers are complex, highly flexible machinery. You can't use something that complex without learning the correct techniques. Hell, operating a car is much simpler, and most people take quite a bit of organized training before they are qualified to drive one (and don't tell me it's intuitive that you press the pedal on the right to go faster, the one on the left to stop, and have to put it in "P" before you can remove the ignition key).

I think VoiceRecognition will inevitably end up using some kind of pidgin - the AI to convert free form speech probably won't be ready before they finally nail recognizing the individual words (they're getting close - but they need 99.9999% before people will start trusting it with "format a:"). --ErikDeBill

Some companies already use VoiceRecognition through your (cell) phone as their interface. Works pretty well, especially considering the audio-centric construction of a (cell) phone. "Pizza. Near Fifth and Bank." is fairly easy for a computer to figure out. I don't think pidgin is necessary insomuch as a limited vocabulary. What's also interesting is how the voice recognition systems would deal with accents, especially given the diverse population we have now. I would certainly enjoy trying one those systems out in a good random sampling of Ottawa's citizenry. -- SunirShah

My wife has a nice phone with a voice interface, but I've had some problems with it:

Phone: Name Please
Me: Hope Thrift Center (where my wife works)
Phone: Speak more quickly, please
Me: Hope Thrift Center
Phone: Speak more quickly, please
Me: Hope Thrift Center
Phone: Speak more slowly, please
Me: Hope Thrift Center
Phone: Dialing Home
Certainly, I know that this is a problem with an implementation of the technology, and most likely an early implementation of the technology, and I expect that either the next phone she brings home will have an even nicer voice interface or none. I know that BeOS CEO JeanLouisGassee has had problems with AmericaCentrism? in VoiceRecognition software that can't accept his heavy french accent, and is working on a more general VoiceRecognition scheme for BeOS. I'm guessing that it is also very unable to handle the variances in tone of languages of Asia, or the central african glottal stop. And both will likely stay true until computing, especially VoiceRecognition computing, is a larger part of their environment. -- DaveJacoby

Some people speculate that voice-to-text is needed to really open up most of the Asian computing markets. The current text input methods are painfully slow, often involving multiple menus of choices on the screen. Even a limited recognition system (which displays a small menu of possible words given a voice input) would be a great help. --CliffordAdams

Completely general speech recognition is a task beyond the ability of many humans. The current speech recognition technologies can be divided into non-trained and trained recognition. Non-trained recognition is like the cell phone examples--it is optimized to be usable by most people for many words, generally in a very limited context. For many people it fails completely.

Trained recognition uses a training session (usually 5-20 minutes), and requires relatively extreme computing power to process a complex speech model. (Even a high-end PC (as of late 2000) can have trouble keeping up with a user if the best speech models are used.) Trained recognition is still generally based on an average speech model, but good software will give a choice of a few different models. (For instance, some programs let the user select between a male and female speech model which helps distinguish the different tones of voice.)

Modern continuous-speech recognition software (like IBM's Viavoice or Dragon's Naturally Speaking) is amazingly good for most people after training. I have had good results with both programs, although I have some difficulty because I have a relatively low/deep tone of voice. With these programs one speaks naturally (without any pauses between words) and simply pronounce punctuation like "comma", "period", "double-quote", etc. Most corrections can be handled by saying either "correct <word>" (to replace <word> with something else), or "scratch that", to remove the last word or continuously-dictated phrase. Current programs have a very good grammar model which allow them to choose the appropriate word even when different spellings are pronounced exactly the same way. (Like "there" in "go there" (following words like "go" or "to") vs. "their car" (preceeding a noun).)

For typical English text I could often reach 60 WPM even with corrections. One really interesting feature is that technical terms like "capacitor" or "thermoresistor" are recognized quite accurately with only minimal training. Speech recognition software is very popular in the legal and medical fields which have large jargon-filled vocabularies (although it is usually used to assist the person transcribing the audio recording).

The hard words to recognize are the most common ones that people tend to be sloppy about. (For example Viavoice had a terrible time with my pronunciation of "there are", which I often speak like "therrr" (with a long-r sound, emphasized partway through). I finally mostly-fixed the problem by defining my pronunciation as a speech-macro command.) Sometimes I listen to the raw-audio recording of my recognized words and wonder if even another human could recognize them. Don't even ask about my handwriting. :-) --CliffordAdams

My uncle is a pharmacist, and he has said that most drug names are mostly recognizable in the regularly illegible script of doctors, and what few aren't are context-derivable (if you know the illness, you know the solution, mostly) but he was often forced to ask which doctor prescribed the medication. Highly technical and latinate words are often easy to figure (Antidisestablishmentarianism breaks down to "anti" "dis" "establish" "ment" "arian" "ism", which is normal prefixies and suffixes around a common word. Thermoresistor would break into "thermo" and "resistor", so that'd also be an easy pattern match), but nonscientific words such as llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch or Cymruwould be more of a problem. --DaveJacoby

(Remember, people! "Speech", not "Speach"--DaveJacoby) [What's the problem? They sound the same to me. --CliffordAdams :-] Funny--DaveJacoby B)

CategoryInterfaceDesign CategoryPervasiveComputing


MeatballWiki | RecentChanges | Random Page | Indices | Categories
Edit text of this page | View other revisions