Wolfram has just posted about the effort, which has taken years of working in stealth and involves more than a hundred workers. He explains the basics of how his “computational knowledge engine” works: You ask it factual questions (such as “How many protons are in a hydrogen atom?”), and it computes answers for you.
Many details about the engine, scheduled to launch in May, have yet to be released. However, Wolfram has shown it to search engine expert Nova Spivack. In a long post, Spivack calls the effort “almost absurdly ambitious” but concludes that it works, and claims that the engine has the potential to touch our lives as deeply as Google.
The engine doesn’t return documents that might contain the answer, like Google does, and it isn’t a giant database, like Wikipedia. Nor does it resort to natural language to return documents, like Powerset does. Rather, Wolfram (pictured left) has created a proprietary system based on fields of knowledge, containing terabytes of curated data and millions of lines of algorithms to represent real-world knowledge as we know it.
You ask it questions in a bar that looks very much like Google’s search bar, but it uses natural language to understand your question or even abbreviated notation. It then provides detailed answers.
As Spivack summarizes, the vision seems to be to create a system that can do for formal knowledge (all the formally definable systems, heuristics, algorithms, rules, methods, theorems, and facts in the world) what search engines have done for informal knowledge (all the text and documents in various forms of media).
With Mathematica, I had a symbolic language to represent anything—as well as the algorithmic power to do any kind of computation. And with NKS, I had a paradigm for understanding how all sorts of complexity could arise from simple rules. But what about all the actual knowledge that we as humans have accumulated?
But if one’s already made knowledge computable, one doesn’t need to do that kind of natural language understanding. All one needs to be able to do is to take questions people ask in natural language, and represent them in a precise form that fits into the computations one can do…I wasn’t at all sure it was going to work. But I’m happy to say that with a mixture of many clever algorithms and heuristics, lots of linguistic discovery and linguistic curation, and what probably amount to some serious theoretical breakthroughs, we’re actually managing to make it work.
…It’s certainly the most complex project I’ve ever undertaken. Involving far more kinds of expertise—and more moving parts—than I’ve ever had to assemble before. And—like Mathematica, or NKS— the project will never be finished.
Note that last part — “the project will never be finished.” It’s clear that this is a massive undertaking and that it faces serious challenges. Spivack, who has toiled away at his own knowledge engine based on semantics (a company called Twine) clearly has a lot of admiration for Wolfram and is diplomatic about his criticism, couching it lower in his piece. There is a host of “hairy questions,” such as the fact that many facts in life are “fuzzy,” such as the scientific evidence of global warming. Even here, though, Wolfram has taken pains in his model to provide multiple answers, and Spivack appears to conclude that this is not a big problem. And while Spivack calls Wolfram’s system and the engine’s user interface “beautiful,” he cautions that it was also designed by and for people “with IQ’s somewhere in the altitude of Wolfram’s — some work will need to be done dumbing it down a few hundred IQ points so as to not overwhelm the average consumer with answers that are so comprehensive that they require a graduate degree to fully understand.”
Notably, the engine is not built using standard semantic web languages such as RDF, OWL and Sparql, in part because these ontologies are too difficult to build and curate for such a wide field of knowledge.
According to Spivack:
This is not to say that Wolfram Alpha IS a cellular automata itself — but rather that it is similarly based on fundamental rules and data that are recombined to form highly sophisticated structures. The knowledge and intelligence it contains are extremely modularized and can be used to synthesize answers to factual questions nobody has asked yet. The questions are broken down to their basic parts and then simple reasoning takes places, and answers are computed on the vast knowledge base in the system. It appears the system can make inferences and do some basic reasoning across what it knows — it is not purely reductionist in that respect; it is generative, it can synthesize new knowledge, if asked to.
Wolfram Alpha perhaps represents what may be a new approach to creating an “intelligent machine” that does away with much of the manual labor of explicitly building top-down expert systems about fields of knowledge (the traditional AI approach, such as that taken by the Cyc project), while simultaneously avoiding the complexities of trying to do anything reasonable with the messy distributed knowledge on the Web (the open-standards Semantic Web approach). It’s simpler than top down AI and easier than the original vision of Semantic Web.
Where Google is a system for FINDING things that we as a civilization collectively publish, Wolfram Alpha is for ANSWERING questions about what we as a civilization collectively know. It’s the next step in the distribution of knowledge and intelligence around the world — a new leap in the intelligence of our collective “Global Brain.” And like any big next-step, Wolfram Alpha works in a new way — it computes answers instead of just looking them up.
Apparently, the service will offer an API, so other developers can build on it.
I can’t wait to use this new engine. I remember when Powerset first emerged, making claims that it could use natural language to understand your questions, and generated a lot of hype. The company didn’t live up to the hype but at least offered a valuable contribution to the search engine field. Wolfram Alpha has the feel of something somewhat more realistic, because the magnitude of its task is so clearly obvious from the beginning, and because the founder concedes from the beginning this is a work in progress.
[Image credit: NNDB]