The real secret to unlocking big data? Math

Image Credit: Leigh Prather/Shutterstock

It’s often said that mathematics is the universal language. It represents ideas and pure logic, crossing every domain from business to social and physical sciences, and even art.

Yet with the advent of more and more (big) data, we have not fully used the power of math as a universal language; instead we’ve focused on more specific concepts like the query and pattern matching to help search information and find relationships in data faster. As technologists, we know all of this ends up as 1s and  0s, but the algorithms behind the data are less of a focus, leaving many problems deep within the lingo and existing practices specific to their domain.

Why is this important? Math is to data what abstraction layers are to software development. With it, we can translate physics and energy for the International Space Station, drug discovery, DNA search efficiency, finding the tomb of  Genghis Khan, and nearly everything else into a common vocabulary for problem statements.

Researchers at U.S. Agency for International Development (US AID) even used this approach to create a way to predict where atrocities would occur around the world.  They used a crowdsourced community of thousands to help identify new data sets, refine what should be predicted, and finally to create a better model for predicting atrocities. Most of the participants did not have deep domain knowledge of the political science of conflict or the existing sociological models for human behavior, but they could pursue the problem intelligently because they understood “how the math should work.”

The approach has obvious benefits. Once the math is done, domain knowledge is not as important; it allows for skills to be leverage across industries/domains, and empowers broader use of core technologies to help in the optimization/solving of key questions. It creates an API to our problem.

APIs in software allow companies to create ecosystems around its products. People outside of organizations can extend products or changes its behavior, bettering products for themselves or sometimes for the industry. Similarly, by expressing and asking for the mathematical representation of our questions, we allow a community — beyond our own internal resources — to help us discover an answer in or with the data. Through this “API” we redefine a problem requiring deep domain expertise, into one that requires skill with “just” mathematics.

Related to this, a well known professor once commented to me, “Big data is misnamed in our (academic) world, because data sets have always been big. What is different is that we now have the technology to simply run every scenario. Before, intuition was critical as you could otherwise spend months chasing a concept. Now, set up correctly, we can just run or solve the model like an equation.”

This ability to turn deep domain problems into complex math problems is groundbreaking for data scientists and big data. For the first time, it allows low friction collaboration with communities of people who may have no association with us. Many crowd communities, including TopCoder, have used this abstraction to solve and optimize problems across many domains to achieve dramatic 1000x type improvements. These results at first seem unbelievable, but once you see them repeated across domain after domain, it becomes evident that when you can communicate your challenge as a math-based problem, you can benefit from hundreds or thousands helping to optimize an answer. It’s only natural that this kind of community will come up with a better solution than just a couple of researchers with deep domain expertise. The outcomes are not surprising, nor is the fact that setting up the problem correctly is often the most important or trickiest part.

While technology and tools are important in any large data exercise, think about how you can use concepts of mathematics to abstract your problem away from your domain. This will allow others inside and outside of your firm to help you find the best answer to the question that drives your business. After all, as Bill Joy once said, “No matter who you are, most of the smartest people work for someone else.”

Narinder Singh is the president of TopCoder and cofounder of Appirio and has more than 15 years of software and business experience. Prior to Appirio, he worked at SAP in the Office of the CEO as a part of the Corporate Strategy Group. Prior to SAP, he managed R&D, sales, and marketing activity as vice president and general manager of webMethods (WEBM) workflow business unit. He also previously led R&D for the company’s BPM, workflow, B2B, and industry products. Narinder began his career with Accenture at its Center for Strategic Technology. Narinder also has worked with several non–profits on their development and supports a number of causes including Architecture for Humanity, Worldvision, and Ensaaf. 

We're studying digital marketing compensation: how much companies pay CMOs, CDOs, VPs of marketing, and more, with ChiefDigitalOfficer. Help us out by filling out the survey, and we'll share the results with you.