A few weeks ago, a dejected CTO told me it took his team three weeks to build a machine learning model. I told him a model in just three weeks sounded great, and he agreed. So why the long face? Because 11 months later, the model was still sitting on a shelf.
That gap between great AI prototypes and AI in operation is starting to be a common theme as AI and machine learning make contact with the real world. The reason is … Actually, there are a lot of reasons, and we can look at a bunch of them, but underneath all the other reasons is the fact that data doesn’t sit still and never will.
Data changes as the world changes. Building an AI or machine learning model means building a way of looking at the world. But as the world and the data change, the models need to adapt. The CTO I met was realizing that building a great model is only the first step.
A model on its own is too brittle for the real world. It needs to live as a larger system that’s actually fluid. So how do we make AI systems that are fluid? By building them with five attributes in mind:
For AI and machine learning to do real and lasting work, they need thoughtful, durable, and transparent infrastructure. That starts with identifying the data pipelines and correcting issues with bad or missing data. It also means integrated data governance and version control for models. The version of each model — and you might use thousands of them concurrently — indicates its inputs. You’ll want to know, and so will regulators.
Being fluid means accepting from the outset that AI models fall out of sync. That “drift” can happen quickly or slowly depending on what’s changing in the real world. Do the data science equivalent of regression testing, and do the testing frequently, but without burning up your time.
That takes a system that allows you to set accuracy thresholds and automatic alerts to let you know when models need attention. Will you need to retrain the model on old data, acquire new data, or re-engineer your features from scratch? The answer depends on the data and the model, but the first step is knowing there’s a problem.
Most AI is computationally intense — both during training and after deployment. And most models need to score transactions in milliseconds, not minutes, to prevent fraud or leverage some fleeting opportunity. Ideally, you can train models on GPUs and then deploy them on high-performance CPUs, along with enough memory for real-time scoring.
And of course you want everything to run fast and error-free regardless of where you deploy: on-prem, cloud, or multicloud.
For the moment, budgets for AI and machine learning projects are generous, but those budgets will dry up if data science teams can’t deliver concrete results. Think from the outset about how you’ll quantify and visualize what you’re learning and how it changes: improvements in data access and data volume, improvements in model accuracy, and ultimately improvements to the bottom line.
Don’t just think about what you need to measure now but also about what you’ll want to measure in the future as your data science work matures. Is the system fluid enough to track those long-term goals?
I started by pointing out that data doesn’t sit still. The fifth and final aspect of fluid AI is about continuous learning as the world changes. Make sure to use tools like Jupyter and Zeppelin notebooks that can plug into processes for scheduling evaluations and retrain models.
At the same time, expect your own learning to grow and evolve as you absorb the advantages and limitations of various algorithms, languages, datasets, and tools. Fluid AI demands continuous improvement for data, tools, and systems, but also continuous improvement from everybody doing the work.
Data science is a journey. Cheesy, but true. Pay attention to these five attributes and you’ll bring focus to each moment and force yourself to find clarity about the future.
The data will never sit still, but would you really want it any other way?
Dinesh Nirmal is the vice president of analytics development at IBM.