Reliable data and where to find them

This article was contributed by Susan Wu, senior director, marketing research at PubMatic.

 Data is a cornerstone of modern business, with the ability to uncover enlightening, even breakthrough discoveries with respect to business decisions. But one dataset can tell many stories, and sometimes those stories simply aren’t aligned with reality. The recent prediction data, released ahead of the 2021 NJ Governor’s election, provides one example. The data forecasted a sizable lead for incumbent Phil Murphy, but in the end, he clinched the victory by an extremely thin margin.

This is not the first, nor likely the last, time that a dataset uncovers falsehoods, which begs the question: Is this data reliable?

While the answer isn’t always clear-cut, data can prove effective and informative when it is appropriately managed. Data sources in today’s business environment are virtually limitless and constantly evolving, creating unprecedented opportunities to successfully leverage data, yet also countless pitfalls when inappropriately analyzed and applied. Avoiding such failure requires accurately defining datasets, identifying data limitations, and establishing reliable data.

Defining the dataset

Quant's data science, or information that can be measured or quantified, clearly plays a critical role in business decision-making, but it must not be viewed as the absolute pathway to success due to the many unquantifiable intangibles that inevitably arise in analyzing and applying data. In other words, relying entirely on quant data to reach decisions may lead to disappointing results.

No cookie-cutter method of analyzing data has yet to be discovered. However, by framing problems clearly and accurately, the chances of solving data-specific issues increase dramatically. Our team, for example, generates a quarterly industry report that looks at ad spend by industry categories. We sought to understand which ad categories were most impacted by events — the global health crisis (which is still a major ongoing event), along with the U.S. presidential election, housing boom, and most recently, the economic recovery — and how the market would recover, so we could anticipate or at least manage expectations on potential future impacts.

While a regression would prove overkill for such research, category classification and segmentation techniques were helpful to understanding seasonality and discretionary spending among categories. The pandemic naturally created anomalies, which had to be considered in the data. At first, looking at year-over-year changes during certain 2020 months only showcased that ad spending was declining. But by looking at quarter-over-quarter, we were able to extract the leading category indicators driving different phases of recovery, which more accurately represented trend lines.

Data limitations

Data hygiene is king, although it invariably comes with data limitations. Consistent, quality, unbiased data is the source of impactful insight into trends, while compromises in these areas tend to create a bias in information. To minimize this concern, constant and vigilant awareness of data limitations (e.g., understanding how and where the data was mined) and seeking ways to keep data in check is vital.

Trend analyses are often used to anticipate future events based on historical behaviors. In the case of our quarterly global digital advertising spend reports, the pandemic made the analyses fairly challenging due to the volatility in the market for an extended period of time. In order to create insightful analysis at an industry level, we employ a regimented protocol for the raw data: how it regularly gets mined from our systems to produce an error-free dataset for analysis. The data is aggregated, “checked and balanced” from other sources, and then vetted to ensure there’s no unintentional bias in the data pool. Only then can we start analysis, as the result will have much greater accuracy.

Reliable data analysis

Less is more when analyzing and writing about data. Readers typically do not require every detail, and data reliability benefits significantly from improved and focused writing skills. Intent reigns when writing data-specific content. Insights must aim at articulating only the necessary aspects of the story. Data reliability increases exponentially alongside a strong written analysis, as does the likelihood of applying it successfully to business applications.

A second, yet equally critical, element of data reliability is the continued exploration and learning from other research and data professionals. Innovative approaches and new data resources continually surface at a frequency never before seen. Keeping up with current trends in a constantly evolving field is a task within itself, yet failure to do so may render all data processes irrelevant and, ultimately, lead a business the way of the dinosaur.

Data is ubiquitous. On one hand, data is absolutely essential for making informed business decisions in today’s global business environment. On the other, it poses the enormous, constant challenge of accurately interpreting a dataset-specific to any given objective. In the end, data is only as valuable as the quality of the analysis. The more refined and meticulous that process, the more invaluable role data can play in everyday decision-making.

Susan Wu is a senior director of marketing research at PubMatic.

Welcome to the VentureBeat community!

Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.

Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!

Defining the dataset

Data limitations

Reliable data analysis

More