Wednesday, May 6, 2009

Man v. Machine

Machine to compete on Jeopardy.

Well now. Anyone who follows AI research probably knows that natural language stuff is terribly difficult for computers. Computers completely outclass humans when it comes to computational stuff, and come in a close second in expert systems like mechanics, physics, and medicine, but a command of natural language has always been the holy grail of AI. Well, that and perceptual stuff like vision and hearing - the ability to extract meaningful information from a lot of noise. That's part of the problem with language really - it encodes very densely, but at the same time has huge volumes of fluff.

Since this is an IBM publicity stunt, it's safe to assume that the groundwork technology is already there. They'll be flying a supercomputer out to L.A. for the show, and the questions will be received electronically (with simple text-to-speech for the answers). Though I don't really know how the technology works, there are a few assumptions that I can make. It probably works off some combination of analytics and search, much like Wolfram|Alpha. In fact, both should probably be considered part of the new breed - Search 2.0. If they're running this contest anything like they ran Deep Blue against Kasparov, then it's probable that they've built up a massive database of Jeopardy questions and answers - possibly even provided by the game show itself, because this is the classic "everyone wins" cross-publicity stunt.

If you read back through the blog, you'll know that I'm a big believer in the ability to form rules and conclusions based on large enough sample sizes. Eventually a general purpose "draw conclusions from the data" program will be created, and that will obsolete a huge number of jobs. This is not that program - this is a very specific example, "build an algorithm to answer questions based on a pattern of previous question and answers, as well as an index of knowledge. Actually, given a large enough QA sample size, I'm pretty sure that it wouldn't even need the index - it would be able to divine new facts from previous questions and answers. Jeopardy hasn't been around for that long though.

All the press I've read has said that Watson won't be able to access the internet. Well why would it want to? If it has databases covering the broadest possible range of information, searching the internet would take a lot longer. This is on top of the fact that the whole thing is running on a giant supercomputer, which can just have a local copy of the relevant sections of internet indexed and ready to go. And even with a blisteringly fast connection, the internet would be too slow.

Mostly, I'm just excited for what Google is going to do with the technology. They're already working on their own version of the QA system, but Google has access to the Google Library - 7 million digitized books and counting. This of course means more information to crawl through, and more information than is available to any other company on the planet. Google also has the best and brightest working for it, and the economic incentive to either beat everyone to the punch or outclass them.