Multi-Modal Computing – What it could mean

Multi-Modal world, as envisioned, back in the day at Adobe.

In 1998, there were research groups looking at multi-modality, and by 2000, folks involved in standards creation were already thinking about multi-modal inputs. Today, Google has a group devoted to multi-modal inputs, although the Wiki is a little bare.

New stuff over at IBM on this topic. CTG (from whom I borrowed the accompanying graphic to this post), specializes in multi-modal input computing.

Facial expression and gesture inputs

Now that we have finger inputs, what about facial expressions? My wife said I was “smug” the other day. How did she read that expression? Could a computer read such a subtle expression? Or just sadness vs. smiling (small children find it difficult to tell the difference… ) It seems to me that we are increasingly trying to create computer interfaces that can understand what we want, almost without us speaking. Thus, my question about facial expressions.

With the new gesture inputs on the iPhone (and on the Mac touch-pad on the newer OS X machines), we are getting closer and closer to naturalistic human behavior. Of course, when I worked on the Corel Graphigo products for the Tablet PC, Microsoft had these ideas, but didn’t have the execution team that Apple has created to productize the concept correctly.

Charles Stross wrote a book called Iron Sunrise: in this admittedly futuristic book, humans use computers through items that don’t have “connectivity” right now — rings, jewelry, glasses, etc (implants too!) that convey information. Is this doable? A first step would be to build a pair of glasses that are location-aware. The next natural step would be to layer online identity onto offline stuff…. more on this topic to come soon.

Gesture interfaces & the iPhone

Interesting posting on how to speed up iPhone input Speed up input by allowing users to “write” using handwriting,

There is one way that I think Apple could make input on the iPhone incredibly quick. And that would be to implement a fingertip-driven variant on Graffiti, the simplified handwriting recognition system that was the primary way to get text into Palm devices for years. (Graffiti started out as a third-party app for Apple’s Newton, incidentally–so if it, or something like it, ran on an iPhone, it would be a sort of homecoming.)

As implemented by Palm on its handhelds, you scrawled Graffiti characters into a little box on the bottom of the screen. But third-party apps such as Graffiti Anywhere let you use the entire screen, which–for me at least–improved accuracly dramatically. I’m not sure if any mobile input solution has ever let me get text into a handheld as quickly.

So I’m thinking that the iPhone’s giant-sized screen would allow for extremely fast, accurate input–and the fact that you’d be doing it with a fingertip rather than a stylus might help. And the precision required to hit the iPhone’s teeny tiny keys would be a non-issue.

What that Multi-Modal world would take to build: a complete structured authoring engine that would use a paradigm like XML and XSLT to provide content and take inputs through multiple modes and output content in whatever mode is “on” and ideal in that scenario or user behavior.

Ned Hayes

CEO | Leader | Author

Multi-Modal Computing – What it could mean

Facial expression and gesture inputs

Gesture interfaces & the iPhone