You’ve been following Watson behind the scenes as he was created at IBM. We’ve now all seen him thrash two humans, Ken Jennings and Brad Rutter, both former champions, on Jeopardy!. Here’s what I’ve been dying to ask someone in the know: how was it possible, on the second night of the tournament, that Watson thought Toronto was a U.S. city?
Let’s review this. The Jeopardy! category was U.S. Cities, and the clue was, ‘This city’s largest airport is named after a World War II hero, and its second largest airport after a World War II battle.’ Watson starts hunting around, looking for these two connections to airports throughout the U.S. and, more broadly, North America and the world. Why would it look beyond the U.S.? Because Watson is never completely sure that it understands the clue. It has to hedge a bit, and allow for the fact that it might not understand.
Watson has also learned, through statistical analysis of the Jeopardy! categories, that they don’t always coincide with the question. For example, a clue on American novelists might say, ‘This masterpiece features a young man named Holden Caulfield.’ The answer to the clue is not J D Salinger, it’s Catcher in the Rye. Watson is aware – statistically at least — that categories can’t always be trusted.
So in the U.S. cities/airport question, Watson goes on a hunt and never really finds an answer it has high confidence in. It has abysmal confidence in both Toronto, which has a couple of airports named after World War I heroes, and Chicago. It probably doesn’t understand the Battle of Midway, so it doesn’t make that connection. Because it has very low confidence, it doesn’t rule out Canada. A lot of people would say, ‘Well, that’s a sign of idiocy,’ and you could argue that, in this case, it was. But Watson has to allow for exceptions.
What exactly do you mean?
Say there was a clue about a 1970s rock star, a male, famous for a song called ‘School’s Out’. If Watson, like traditional computers, had a list of female names to use to figure out if someone was a male or female, it would never consider Alice Cooper. The list would tell it that Alice Cooper was a woman’s name. (The same applies to Evelyn Waugh.) Watson is built so that if there’s enough evidence that something is right, even if it appears wrong, it can go with it.
But the IBM people, at one point, had a team devoted to trying to steer Watson away from this type of embarrassment. It was called the ‘dumb team’. They knew how much it would hurt IBM if Watson, despite an overwhelming performance, committed a brain-dead mistake like that. They actually considered having Watson say ‘I don’t know’ instead of hazarding a guess! Because we have common sense when we guess, and Watson utterly lacks it…
We should be getting on to your books, but I can’t resist asking: What about the timing issue? A lot of these questions are quite easy, perhaps not for me, but for the likes of Ken Jennings and Brad Rutter. So it’s not about whether they know the answer, but how quickly they can get to the buzzer. Now I know Watson can’t hear, so he was receiving his question via an electronic signal, which presumably arrives instantly. Doesn’t that give him an unfair advantage over the humans, who have to wait for Alex Trebek [the show’s host] to read out the question?
Watson ‘sees’ the electronic version of the message at the same moment that that message is exposed to the humans on the screen. Humans do not listen to Alex Trebek. No one listens to Alex Trebek.
Oh. What do they do?
What the humans do is they speed read the answer on the screen. They immediately look for keywords to see if they have enough confidence to bet on it, so that if they win the buzz, they can come out with the answer in the 4-5 seconds they have to come up with it. You can see Ken Jennings, especially, doing that: winning the buzz and then struggling to come up with the answer. That’s what humans do.
But doesn’t Watson have an advantage because he can read it instantly?
I would argue the opposite. I would say that that is one of Watson’s great disadvantages — seeing words and struggling to understand them. Language is extremely foreign and difficult for Watson. Where Watson has a real advantage is in its speed to the buzzer. Once it has calculated that it has enough confidence in the clue to buzz, then, as soon as the light goes on, opening the floor to buzzing, Watson requires only 1/100th of a second to buzz. The humans are not that fast. And the way humans compensate for this — and Jennings and Rutter were themselves famous for this, frustrating other Jeopardy! players — is that they anticipate the end of Alex Trebek’s sentence. Almost the way a jazz musician can sense the coming downbeat, they establish a rhythm with the guy sitting on the desk who opens the buzzer. So when Alex Trebek is finishing his sentence, they anticipate it and beat normal humans to the buzz. Of course, the danger is that in Jeopardy!, if you buzz too early, you’re locked out for a crucial quarter of a second. So Watson beat them on the buzz in most categories, and some people — Alex Trebek for one — didn’t think this was fair.
I felt the same.
But think of it from this point of view. In 2006, the head of research at IBM was going around the various research teams saying, ‘We need a new challenge. We did chess. Now we need to do something that puts computing into the worlds of language and knowledge. Let’s do a Jeopardy! thing.’ Everyone turned him down, because IBM had a system that answered questions — it was a slow-responding programme that got two out of three questions wrong. And these were questions that were much more simply posed than the Jeopardy! questions.
Stephen Baker was BusinessWeek’s senior technology writer for a decade. He is the author, most recently, of Final Jeopardy: Man vs. Machine and the Quest to Know Everything, a behind-the-scenes look at an IBM team’s development of Watson, the computer that recently competed on the American general knowledge game-show, Jeopardy!.