Data science in a post-COVID world

I am often asked about the state of data science and where we sit now from a maturity perspective. The answer is pretty interesting, especially now that it’s been more than a year since COVID-19 rendered most data science models useless — at least for a time.

COVID forced companies to make a full model jump to match the dramatic shift in daily life. Models had to be rapidly retrained and redeployed to try to make sense of a world that changed overnight. Many organizations ran into a wall, but others were able to create new data science processes that could be put into production much faster and easier than what they had before. From this perspective, data science processes have become more flexible.

Now there is a new challenge: post-pandemic life. People all over the world believe an end to the pandemic is in sight. But it is highly unlikely we will all just magically snap back to our pre-pandemic behaviors and routines. Instead, we’ll have a transition period that will require a long, slow shift to establish a baseline or new set of norms. During this transition, our data models will require near-constant monitoring as opposed to the wholescale jump COVID prompted. Data scientists have never encountered anything like what we should expect in the coming months.

Tipping the balance

If asked what we most miss about life before the pandemic, many of us will say things like traveling, going out to dinner, maybe going shopping. There is tremendous pent-up demand for all that was lost.

There’s a large group of people who have not been adversely affected financially by the pandemic. Because they haven’t been able to pursue their usual interests, they probably have quite a bit of cash at their disposal. Yet the current data science models that track spending of disposable income are probably not ready for a surge that will likely surpass pre-pandemic spending levels.

Pricing models are designed to optimize how much people are willing to pay for certain types of trips, hotel nights, meals, goods, etc. Airlines provide a great example. Prior to COVID-19, airline price prediction engines assumed all sorts of optimizations. They had seasonality built in as well as specific periods like holiday travel or spring break that drove prices even higher. They built various fare classes and more. They implemented very sophisticated, often manually crafted optimization schemes that were quite accurate until the pandemic blew them up.

But for life after COVID, airlines have to look beyond the usual categories to accommodate the intense consumer demand to get out and about. Instead of going back to their old models, they should be asking questions like “Can I get more money for certain types of trips and still sell out the airplane?” If airlines consistently run models to answer these and other questions, we’ll see an increase in prices for certain itineraries. This will go on for a period of time before we see consumers gradually begin to self regulate their spending again. At a certain point, people won’t have any piled up money left over anymore. What we really need are models that identify when such shifts happen and that adapt continuously.

On the flip side, there is another segment of the population that experienced (and continues to experience) economic difficulties as a result of the pandemic. They can’t go wild with their spending because they have nothing or little left to spend. Maybe they still need to find jobs. This also skews economics, as millions of people are attempting to climb back up to the standard of where they were pre-COVID. People who previously would have played a sizable role in economic models are effectively removed from the equation for the time being.

Model drift

COVID was one big bang where things changed. That was easy to detect, but this strange period we will now be navigating -- toward some kind of new normal -- will be much harder to interpret. It's a case of model drift, where reality shifts slowly.

If organizations simply start deploying their pre-COVID models again, or if they stick with what they developed during the pandemic, their models will fail to give them proper answers. For example, many employees are ready to return to the office, but they may still opt to work from home a few days a week. This seemingly small decision affects everything from traffic patterns (fewer cars on the road at peak periods) to water and electric usage (people take showers at different times and use more electricity to power their home offices). Then there are restaurant and grocery sales — with fewer employees in the office, catered lunches and meals out with colleagues drop from pre-pandemic levels, while grocery sales must account for lunch at home. And here we're only looking at the effects of a single behavior (transitioning to partial work-from-home). Think about the ripple effects of changes to all the other behaviors that emerged during the pandemic.

The slow march to normal

In establishing an environment to contend with this unprecedented challenge, organizations need to unite entire data science teams, not just the machine learning engineers. Data science is not just about training a new AI or machine learning model; it’s also about looking at different types of data as well as new data sources. And it means inviting business leaders and other collaborators into the process. Each participant plays a role because of all of the mechanics involved.

These teams should look at patterns that are emerging in geographies that have opened up again post-COVID. Is everything running at full capacity? How are things going? There is quite a bit of data that can be leveraged, but it comes in pieces. If we combine these learnings with what we saw prior to and during COVID to retrain our models, as well as ask new questions, then we’re looking at highly valuable data science with mixed models that accounts for swings in practices and activities.

It is imperative that teams persistently monitor models — what thesey do, how they perform — to identify when they become out of whack with reality. This goes way beyond classic A/B testing and also involves challenger models and mixing models from pre-COVID with newer ones. Try out other hypotheses and add new assumptions. Organizations might be surprised to see what suddenly works much better than before — and then to see those model assumptions eventually fail again.

Organizations should prepare themselves by putting in place a flexible data science function that can continuously build, update, and deploy models to represent an ever-evolving reality.

Michael Berthold is CEO and co-founder at KNIME, an open source data analytics company. He has more than 25 years of experience in data science, working in academia, most recently as a full professor at Konstanz University (Germany) and previously at University of California, Berkeley and Carnegie Mellon, and in industry at Intel’s Neural Network Group, Utopy, and Tripos. Michael has published extensively on data analytics, machine learning, and artificial intelligence. Follow Michael on Twitter, LinkedIn and the KNIME blog.

Tipping the balance

Model drift

The slow march to normal

More