(Well, that's not quite true. CrossPad? used a ConnectorConspiracy?-lead dongle to connect the unit to the PC, and that dongle failed on me. The fact that you couldn't buy pieces sort of soured me on it. It wouldn't be the first time a badly-supported implementation soured the public on a technology.)
But anyway, the Newton did real HandwritingRecognition, and there was a training period when the unit learned your handwriting. And anyone trying to use your Newton wrote gibberish because it was trained for you. PalmOS does false HandwritingRecognition, where you, the user, get trained to write in Graffiti. That's what made PalmOS take off.
It works very well, except for when (as I mentioned) when the for-machines handwriting skips into for-people uses. I have done it. (I've been using PalmOS PalmTops for about a year and a half.) It does work very well. I'm not sure that we want to do VoiceRecognition this way, but then again, maybe we have to.
Some companies already use VoiceRecognition through your (cell) phone as their interface. Works pretty well, especially considering the audio-centric construction of a (cell) phone. "Pizza. Near Fifth and Bank." is fairly easy for a computer to figure out. I don't think pidgin is necessary insomuch as a limited vocabulary. What's also interesting is how the voice recognition systems would deal with accents, especially given the diverse population we have now. I would certainly enjoy trying one those systems out in a good random sampling of Ottawa's citizenry. -- SunirShah
My wife has a nice phone with a voice interface, but I've had some problems with it:
Completely general speech recognition is a task beyond the ability of many humans. The current speech recognition technologies can be divided into non-trained and trained recognition. Non-trained recognition is like the cell phone examples--it is optimized to be usable by most people for many words, generally in a very limited context. For many people it fails completely.
Trained recognition uses a training session (usually 5-20 minutes), and requires relatively extreme computing power to process a complex speech model. (Even a high-end PC (as of late 2000) can have trouble keeping up with a user if the best speech models are used.) Trained recognition is still generally based on an average speech model, but good software will give a choice of a few different models. (For instance, some programs let the user select between a male and female speech model which helps distinguish the different tones of voice.)
Modern continuous-speech recognition software (like IBM's Viavoice or Dragon's Naturally Speaking) is amazingly good for most people after training. I have had good results with both programs, although I have some difficulty because I have a relatively low/deep tone of voice. With these programs one speaks naturally (without any pauses between words) and simply pronounce punctuation like "comma", "period", "double-quote", etc. Most corrections can be handled by saying either "correct <word>" (to replace <word> with something else), or "scratch that", to remove the last word or continuously-dictated phrase. Current programs have a very good grammar model which allow them to choose the appropriate word even when different spellings are pronounced exactly the same way. (Like "there" in "go there" (following words like "go" or "to") vs. "their car" (preceeding a noun).)
For typical English text I could often reach 60 WPM even with corrections. One really interesting feature is that technical terms like "capacitor" or "thermoresistor" are recognized quite accurately with only minimal training. Speech recognition software is very popular in the legal and medical fields which have large jargon-filled vocabularies (although it is usually used to assist the person transcribing the audio recording).
The hard words to recognize are the most common ones that people tend to be sloppy about. (For example Viavoice had a terrible time with my pronunciation of "there are", which I often speak like "therrr" (with a long-r sound, emphasized partway through). I finally mostly-fixed the problem by defining my pronunciation as a speech-macro command.) Sometimes I listen to the raw-audio recording of my recognized words and wonder if even another human could recognize them. Don't even ask about my handwriting. :-) --CliffordAdams
My uncle is a pharmacist, and he has said that most drug names are mostly recognizable in the regularly illegible script of doctors, and what few aren't are context-derivable (if you know the illness, you know the solution, mostly) but he was often forced to ask which doctor prescribed the medication. Highly technical and latinate words are often easy to figure (Antidisestablishmentarianism breaks down to "anti" "dis" "establish" "ment" "arian" "ism", which is normal prefixies and suffixes around a common word. Thermoresistor would break into "thermo" and "resistor", so that'd also be an easy pattern match), but nonscientific words such as llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch or Cymruwould be more of a problem. --DaveJacoby
CategoryInterfaceDesign CategoryPervasiveComputing