IBM is basking in the glow of the victory of its Watson supercomputer over the Jeopardy TV show’s best human players.
But watching the supercomputer answer questions during the three nights of the Jeopardy shows was puzzling to a lot of viewers, who could see how awesome Watson could be at some questions and how terrible it was at others. John Prager, one of the 25 researchers who worked on Watson’s programming, was at IBM’s event last night during the airing of the final Jeopardy match and he explained both how Watson worked. The explanations give us insight into the nature of human intelligence compared to machine intelligence, one of the oldest technological problems known to computer scientists.
Prager said it’s instructive to compare DeepBlue, IBM’s supercomputer that beat chess champion Garry Kasparov in 1997, to Watson, which played Jeopardy. Chess is the “iconic representation of human intelligence,” Prager said, while Jeopardy is more like the “iconic representation of what it means to be human.” There is very little crossover between the algorithms used to play chess and those used for Jeopardy. That’s because Jeopardy represents a big “natural language understanding” problem.
For humans, understanding language is easy. But that’s the tough part for Watson. Questions like “where was Einstein born?” are easy for Watson if there is a precise answer in its database. If there isn’t an exact match, the computer would have to infer from various snippets of data that suggest the answer. For instance, if the database says that “Jack Welch ran GE like an artist,” the computer might think that Welch was an artist, rather than the former chief executive of General Electric.
Jeopardy has a broad range of topics, opening up the whole of human knowledge for the computer to try to understand. That also makes the artificial intelligence problem tough. Watson needs precision and it has to supply the top answer, in contrast to Google’s search engine which can supply a range of answers. In Jeopardy, if you get a wrong answer, you’re penalized points.
Watson also needs speed, since it’s playing against fast human players. The 2,880 IBM Power750 cores (or computing brains) helps, as does the 15 terabytes of memory it has. The IBM researchers started four years ago and they found that their computers was extremely poor at getting Jeopardy questions right. The programming is in Java and C++. IBM also tapped a fan site that had typed in all of the previous Jeopardy questions and answers over the history of the game show.
“We had a long way to go,” Prager said.
There are a lot of components to what it came up with, including a “question analysis” system that starts working just as a question comes in. That part is like a search engine. It comes up with a couple of hundred possible answers that it wants to process further. Then it runs 100 to 200 algorithms to look for different features among the answers. A machine language algorithm then sorts which are the most important solutions. Then it calculates a confidence level for the rankings and only gives a ranking if it is above a certain threshold.
In the category of U.S. Cities on Tuesday night, Watson gave the lousy answer “Toronto” in response to a question. That was because it didn’t put much weight on the category in terms of putting boundaries around its possible answers. There are actual cities named Toronto in the U.S., and the Toronto Blue Jays baseball team plays in the American League. These facts may have thrown Watson off, to comic effect. Watson put a lot of question marks after its answer, which means its confidence level was low and was only answering because it was forced to answer.
It also said “1920s” in response to a question about when Oreo cookies were introduced. That was just after human opponent Ken Jennnings was told he was wrong when he said “20s” on the same question. That was because IBM’s researchers chose to simplify Watson’s programming by making the computer “deaf” to the responses of other players. In many sparring games, the likelihood of that circumstance was very low, Prager said.
“It was sheer bad luck that happened,” Prager said.
Host Alex Trebek noted last night that Watson made random bets for dollar amounts when it had Final Jeopardy or Daily Double gambles to make. Prager said Watson determined it didn’t need to bet everything on Tuesday night in order to win. And by statistical analysis of the category, it predicted it wouldn’t do well on short categories or certain topics. So then it knew it had to bet small amounts sometimes. The researcher who programmed that part thought that betting “zero” would be boring and that a random amount would by funny.
Watson wasn’t connected to the Internet, but IBM took a lot of library-like sources and fed them into Watson’s database. That was all done ahead of time. When a program runs live, the code and data is fed into Watson’s memory. Then processors kick in and fetch the best answers. A lot of the data is duplicated and most of the data was updated to within a few weeks of the show’s filming.
IBM trained Watson in 55 sparring games against former Jeopardy winners. Watson won 71 percent of the time. That was the more scientific result compared to the championship matches. Prager said the experience of watching Watson was like a parent watching a child perform in a school play, where you hope the child doesn’t flub his or her lines.
“The trained eye can predict whether Watson will get the answer right when the question comes up,” Prager said. “Watson struggles what they’re really asking for, and if the language is really clear. Jeopardy is an entertainment show and the question is often worded to inform or entertain.”
Burn Lewis, another IBM researcher, said that Watson actually had to press a buzzer to answer questions in the show. The questions are visible to the human players so it “hits their retinas about the same time it hits Watson’s chips,” Prager said. It could do so within six milliseconds, faster than most humans can react. Lewis said that the humans who beat Watson on the buzzer were really gambling, or predicting they could answer a question upon hearing the last word.
Watson generally jumped around a lot on categories because it was looking for Daily Doubles, which offer the chance to get the most points from a question, and there is usually a pattern to where contestants can find those Daily Doubles, Prager said.
IBM thinks that Watson’s innovations can be used in things such as automated customer support.
IBM has put up a blog post that delves into Watson’s Final Jeopardy trouble. There’s also a lot more on the subject of Watson in Stephen Baker’s new book, Final Jeopardy: Man vs Machine and the Quest to Know Everything.
You can check out a preview match between Watson, Jennings and Rutter (where the humans were also destroyed). Also, check out the video of Prager’s talk below. The video is dark because it was shot in a dark comedy club without much lighting, so I apologize for that. But you can hear Prager’s explanations fine.