First of all - you may all be quite pleased that I have now demonstrated the mobile shopping list manager which means you'll no longer be hit by my frustrated posts about being a fail-programma! The bad news is that I have other aspects of university that will be eating my time instead of that. Ontologies, Natural Gestures in Usability Studies and Rootkits, oh my!
What doesn't kill you only makes you stronger, right?
Today's academic-related ramble is related to Natural Gesture input for improved computer interaction for humans. Here I am doing what
icelore is guilty of doing - I am sharing academic pieces on my livejournal because my greedy little eyeballs like reading academic/well researched pieces (look into
thalestral 's pieces on female comic book characters!) which aren't related to computer science so I'm going to throw something I find interesting and am knowledgeable about right back atcha.
So, earlier I spoke about how uncomfortable a research paper published by ACM, "
From Brains to Bytes" made me feel when I heard the presentation itself and subsequently read the paper itself. For this class, within my group, we've taken a less invasive research approach (Wii remotes!) to investigate a more natural way of interacting with computers that doesn't involve keyboard/mouse combination devices. The 6-week research project is a drive toward the realisation of
Google's 2011 April Fools.
First: watch this. It's pretty cool.
Why are Gestures an Improvement on Traditional Input Methods?
Outline
- Traditional methods of interaction aren't natural - they are learned.
- Investigation into alternative methods of interaction are often task-specific (use of pen input, touch-screen) and learned. Apple have spent the past years teaching their users how to use iOS4 in preparation for the launch of the iPad through the iPod touch and iPhone. The gestures (non-natural) associated with these devices are, likewise, not natural but are iconic or even intuitive and the product of developing mental maps.
- Moore's law - computers have been getting smaller, better, faster, stronger but people remain the same with the same limitations on what they can do to interact with computers. Perhaps now the direction to be taken should be toward improving interaction through task-specific computers as opposed to making them even more faster.
- Humans naturally interact using a combination of speech and gesture. These gestures are a combination of autonomous (independent from speech) and gesticulation (gestures in association with speech).
- Gestures are ideal for direct object manipulation while natural language is suited for descriptive tasks. -- Weatherman study!
- Gesture recognition (association with a specific command/meaning) gains 90% accuracy using Hidden Markov Models (HMM) where the gestures are given a syntactic and grammatical rules for the user to adhere to. This isn't natural at all, but increases success. What we're trying to focus on are deictic gestures - natural gestues that come through to support speech.
- Current existing studies involve gesture and speech where speech is involved in directing the gesture. "Put
that
there" - The use of false-computational experiements to see what kind of gestures users will come up with (once a human understood the intent of the gesture they would manually respond on another computer in a hidden room)
- Implementation of an environment which allows gestures to be detected in their true (3D) form as opposed to 2D representation gained from Kinect-like devices.
- Feedback is a major factor in natural gesture input to overcome the complications of gestural input.
- Gestures are not universal. Hard to generalise and generally a pain in the ass.
- Will they ever be truely natural?
Gestures are perceived as an image-dialectic that fuels speech and thought; often gestures convey a meaning which the speaker cannot express through words (David McNeil “Gesture and Thought). The mapping of gestures directly to a computer command is difficult for computer scientists as a result of the fact many gestures are learned. People may also use two different gestures (for example, stand up with one or two arms) to mean the same thing which makes the generalisation of gestures extremely difficult.
Traditional interaction with a computer is often through the use of a mouse and a keyboard. Although for many computer users this method of human computer interaction is familiar it is not natural way of communication for humans in comparison to speech and gesture. Computers, with their familiar interfaces, have been proving Moore’s Law for many years; becoming faster, smaller, better with each year. People, however have remained the same. This has been the motivation for many researchers into discovering ways to make the way we interact with computers more user friendly by approaching it in a more human way.
Although there is already research in the domain of computer interaction using “natural” gestures [JCL], much of this research involves capturing the three dimensional environment within which the user resides. This form of gesture detection detects the subtleties between two different gestures that in a 2D environment would appear to be the same gesture but with an observation of a z-axis are revealed to be different. It also requires the user to be standing within view of all sensors for gestures to be correctly identified.
An inadequacy within the domain of gesture detection and recognition is the inherent difficulty in generalizing gestures to perform given actions. Users will often assume different gestures to perform the same task (for example, arm gesture for up using single and double arms). This also raises the question as to whether height, weight, posture and other physical and cultural attributes of the user affect how the camera gestures are recognized. Once a gesture is callibrated for one type of person, will someone will shorter arms be able to perform the same gesture and have it detected by the camera in the same way or will the camera consider these two gestures to be different.
In recent years the abundance of devices containing valuable sensory equipment has risen dramatically with the games console market thriving due to advances in interaction technology. One of these devices, the Nintendo Wii, provides controllers with accelerometer and gyroscopic information. Numerous projects have been undertaken to adapt these controllers for use in other novel systems [1].
Further research by “Sixth Sense” [MIS] illustrated the use of learned gestures and poses, which when detected, signal a command to a portable computer.
Wobbrock et al [WOB] identify a method of maximizing guessability of symbolic input. Following such a process will allow us to produce an optimal gesture set which feels natural and common to users.