Microsoft engineers spoke publicly for the first time yesterday about how they built the Kinect motion sensing system, offering a rare glimpse inside the secret world of product design.
After being taken to task for designing Xbox 360 game consoles that failed in large numbers, Microsoft turned a corner: Very few customers have returned the Kinect add-on for the Xbox. One of the reasons was the engineering discipline that the company applied in the wake of its defect scandal.
People are enjoying the system, which lets them control games with their body motions rather than standard game controllers.
At a chip conference yesterday, Microsoft engineers said they deliberately over-designed the Kinect system so that it could withstand anything that consumers could throw at it: hot temperatures, drops, careless shipping, abusive gamers, a sudden loss of power, and even surge protection from lightning strikes. (To be clear, it won’t survive if hit by lightning. But if your house or electrical wires are hit by lightning and the power surges, then Kinect has a chance of surviving). The result: the Kinect became the best-selling consumer electronics product in history, selling more than 8 million units in its first holiday selling season.
The talk at the Hot Chips conference at Stanford University was unique, as the Kinect engineers had never been allowed to talk about their labors publicly before.
Some people might have the impression that Microsoft simply bought the technology for Kinect, since it did in fact buy several companies that focused on detecting motion in a three-dimensional space. Despite those acquisitions, Microsoft actually did a lot of the engineering in-house. It had to pull together the expertise to create a high-level 3D sensing system — with gesture recognition, video and audio — that had to be dead-simple for consumers to use, and it had to build the system from scratch in 18 months. You can get a glimpse of the complexity from the picture at right.
“Kinect had to be approachable,” said Dawson Yee (pictured below right), one of the engineers on the Kinect team, which was part of the hardware design team at Microsoft in Seattle. “It had to be easy to set up and work like magic. You plug it in, and it works.”
Microsoft’s marketers helped a bit by showing the engineers who would be their target consumers. While Microsoft already had hardcore gamers, such as fans of Halo, it didn’t have the broader mass market of non-gaming consumers that the Nintendo Wii was getting. So the marketers went around the world visiting the homes of consumers to get a sense for what they wanted.
The Kinect team knew that they had to design an attractive product. If you’re going to put this system in front of your TV in the most visible spot in the living room, it has to look cool. So industrial designers were enlisted so that consumers could look at the product without doing a double take.
“It can’t look junky or overbearing,” Yee said.
On top of meeting consumer needs, Microsoft had to create the device so that it complied with regulations on wireless emissions and other sorts of strict safety guidelines. It also had to be better than the regulations. For instance, if a user put a finger on the Kinect and found that it was hot, that could be alarming, even though its temperature could be very well be within the regulatory guidelines.
“You had to test it by dropping it on concrete,” said Yee, who has worked at Microsoft for 12 years and at Intel for a decade before that. “That was the level of robustness.”
One of the tough things was that Microsoft’s hardware engineers had to design the system out of thin air. They didn’t know exactly how software creators would use it. Nintendo had made the motion-sensing Wii, which depends on hand-held Wiimotes. But Microsoft had to design something with different kinds of sensors that were capable of sending signals around a room and then receiving them so that it could produce a digital representation of the spatial features of that room, without depending on remote controllers. The system had to see what was in the room, who was moving, and what they were doing. While the Wii could detect your arm motion, Kinect could go further. If you want to kick a ball in a Kinect game, you make a motion with your leg to kick a ball.
To that end, Microsoft acquired 3DV Systems, a motion-sensing chip maker. But it ultimately used a motion-sensing chip (the PS 1080 depth sensor, pictured in the schematic) from PrimeSense. Microsoft also licensed technology from GestureTek, which had patents in the space. All of that gave the company design freedom to create what it needed without encroaching on patents held by others. Just before Microsoft shipped, it bought another 3D sensing firm, Canesta.
The PrimeSense system uses a near-infrared laser diode to send out a signal into the room. PrimeSense encodes information in light patterns as the signal goes out, and the distortion of the pattern is what the receiving camera looks for to figure out depth. The camera gets the infrared light back and the processor creates an image of the room and the people in it. The chip deciphers the image and where it’s moving.
Microsoft’s own software finishing the image detection and sends it own as information to the game console. The camera can see any number of people in a room, but there could be limits on how many moving people can be tracked at a time. The sensor has to be responsive to fast movements, but it can’t consume too much power. The optics had to be exactly placed within the unit during the assembly process.
The difficult thing is that a lot of things can interfere with the signal. Ambient light can cause distortions in the measurements. And there are tough perception problems, like someone wearing a white T-shirt too close to the camera, or someone standing far from the camera in a dark shirt against a dark background.
“We would not have shipped a product if we did not solve every single one of these issues,” Yee said.
Since some of the technology, such as optics, was new to Microsoft, the company had to hire new engineers such as Scott McEldowney (pictured, left), who had 18 years of optical experience and who also gave the talk alongside Yee.
Some users have complained that the Kinect isn’t accurate enough or that its sweet spot in a room is too small for some living spaces. But for the most part, Kinect is remarkably accurate. To make it more accurate would have cost a lot more money without too much gain, said McEldowney.
“We knew this thing was going to be viewed as a toy and so it was going to be abused,” McEldowney said.
As it was, suppliers balked when Microsoft asked them for high-quality parts at low prices. The suppliers said that Microsoft was demanding the same quality as telecommunications gear, but for a low price. Haggling ensued.
One of the amazing things was that the suppliers were able to ramp up manufacturing so that Microsoft could ship 8 million units in a single season. That is unheard-of for a new product, and Sony’s rival PlayStation Move system shipped a lot fewer units.
Each system has to be calibrated before it ships, but the system also has to be capable of calibrating itself in tests with the user. The requirement for calibration meant that the system had to have a tilt motor which could automatically raise or lower the sensor. With the motor came more requirements for precision manufacturing and reliability. When you turn on Kinect, the first thing the camera does is look for the floor. When it finds the floor, it knows a user won’t be far away.
The system was “over-designed” to be more accurate than necessary because the engineers anticipated future applications that would need the accuracy. Microsoft anticipated people would hack the system and deliberately left the universal serial bus (USB 2.0) open. After shipping Kinect, Microsoft was surprised at the enthusiasm of the hackers who modified the system. Microsoft then shipped a software development kit to allow users to modify systems for their own applications.
The engineers didn’t have any example to follow, so they started by figuring out the limits of what was possible in terms of the laws of physics, the state of the art of the available technology, the cost target, the schedule, and the basic functions they thought the system would need. There were practical limitations on manufacturing and the supply chain that had to drive what the hardware people designed. Otherwise, they would never be able to ship a system on schedule for the right cost.
“We went as far as the material limits would allow us,” Yee said.
Part of the engineering problem was integrating a video camera, 3D sensor, and audio array into the product in the same confined space. The devices had to be placed precisely so they were synchronized and didn’t interfere with each other. The camera had to have a wide field of view, but it couldn’t introduce errors that could cause inaccurate depth estimates.
The audio array was a Microsoft research invention with four microphones that enabled Microsoft to identify spoken words and tell the direction that a sound was coming from. The box includes a Marvell 88ap1 audio chip and a Texas Instruments TAS1020 motor controller. The audio has to be capable of handling speech commands, doing simultaneous voice chat between gamers, and video conferencing. For that purpose, Microsoft with higher-quality wideband 24-bit audio. The quality had to be good enough so Kinect could pick up a quiet voice or a loud voice. The Marvell chipped handled the audio.
As the hardware plans came together, so did the team building the software and the marketing as well. The marketers focused their ads on the reactions of players who were playing it. Kudo Tsunoda, creative director for Kinect games, told us last year that the whole goal was to let people who were otherwise intimidated by controllers to jump in and play.
“I worked on Kinect for three years, but the technology that Kinect is based on has been in the works for almost a decade with Microsoft research,” Tsunoda said last year. “It was in the last three years that we made it so anyone could use it and it could go into millions of homes.”
Because of all of the high-tech gear in the box, Microsoft didn’t exactly hit its target for costs. But it probably did OK with the Kinect in terms of the bottom line. An outside testing firm estimated that the cost of a Kinect system is about $56 for the hardware alone. Microsoft sells Kinect for $149.
In conclusion, Yee said, “We nailed it.”