Issues with speaking out

Mar 13, 2009 02:25


Originally published at The VidZone Network Blog. Please leave any comments there.

The big project I worked on during my time at FIEA (Florida Interactive Entertainment Academy) was a voice-controlled vehicular combat game that casts the player as the ship’s captain that we ultimately titled Zephyr: Tides of War.   (Follow that link to download and play it.  Requires WinXP, not Vista compatible.)  We were fortunate that the school believed in simulating a true working environment, down to the tools provided, so we were able to build on industry middleware including Gamebryo and Fonix VoiceIn.



I was leading the charge with voice-related design, including what commands would be available to the player as well as how the crew would respond to both your input and to things happening within the game world. One of the things we really aimed for was to make the most of using such a natural input as voice and not just make it a different way to push a button. To that extent, we labored to think of all sorts of variations and permutations of things that the player might say. For example, would the player use nautical terms like bow, stern, aft, port, starboard, etc? Or would they want to use layman’s terms like forward, back, left, right? How many ways can you say “move?” Go, forward, move, full speed ahead, set sail, etc. And of course gradations for different rates of movement or turning, etc.

I thought it would be a great opportunity to free up the player from the usual learning curve of memorizing what button does what. Look at a computer keyboard, there’s about 104 keys there, and modern video game controllers have almost 20 buttons and knobs. Who ever decided that pressing A would mean jump? It’s so arbitrary and it’s a dark art to cram in functionality onto a random mess of buttons. With voice, ideas are easily and instantly translated from thought. As long as we account for a fair amount of usual synonyms, we aimed to recognize any command by the third variation that a person might try. No tutorial would be necessary, and the software promised that no annoying individual voice-training would be needed.

Of course things never go according to plan. Despite the elaborate spreadsheets we made of various ways to say commands, the software just couldn’t process them reliably enough. One of our programmers created a tool that would run in the background while playing the game. It would listen to commands the players would issue and output a log of what command the software thought the player was saying (as cross-referenced off a database we fed to the game of phrases to look for), a numerical percentage of how close the match was, and if below a certain threshold, other phrases that sounded similar. Crushingly, our acceptable command list was chiseled down to something fairly bare-bones and far from robust.

How did this happen? Words are constructed from a collection of phonemes which dictate the individual sound components that make up a word. There’s a relatively limited set of them in any given language. So when you see spy movies like Mission Impossible, they carry a card with a pre-constructed set of sentences which contains all possible phonemes. When recorded and dissected, those phoneme samples can be fed into a vocal replicator that allows a spy to mimic someone else’s voice. Anyway, our software was tripping up and getting confused with many phrases. “Ascend” and “descend” sound like common words you should have in a game that requires you to navigate in 3D space, right? Sorry, they sound too similar. And unfortunately, if we told the computer to just make an educated guess, the two actions are direct opposites, so there’s an extremely high likelihood of the resulting action being not what was intended by the player.

As you can see above, since we had a much more finite list of commands, we put them on a slide-out bar that the player could toggle if they forgot what they could say.  It’s a shame; we were going for a more minimal and voice-based UI where the crew’s shoutings and the ship’s visual state would tell you all you would need to know intuitively.
Previous post Next post
Up