The importance of data audits when building AI

Artificial intelligence can do a lot to improve business practices, but AI algorithms can also introduce new avenues of risk. For example, consider Zillow’s recent shutdown of Offers, the branch of the company dedicated to buying fixer uppers, after its prediction models significantly overshot house values. When housing price data changed unpredictably, the group's machine-learning models didn't adapt quickly enough to account for the volatility, resulting in significant losses. This type of data mismatch or "concept drift" happens if you don't give proper care and respect to data audits.

Zillow’s failure to properly audit its data didn't just hurt the company; it could have caused wider damage by scaring other businesses away from AI. Negative perceptions of a technology can halt its progress in the commercial world, especially for a category like AI that already went through several winters. Machine-learning pioneers like Andrew Ng recognize what hangs in the balance and have started campaigns to emphasize the importance of data audits by doing things like holding an annual competition for the best data quality assurance methods (instead of picking winners based just on model as it's traditionally been done).

Beyond my own work to build AI, as host of The Robot Brains podcast, I’ve also interviewed dozens of AI practitioners and researchers about their approach to auditing and maintaining high-quality data. Here are some of best practices I've compiled from that work:

Applying AI models to solve business problems is becoming common as the open-source community makes them freely available to all. The downside becomes that as AI-generated insights and predictions become the status quo, the less flashy work of data maintenance can get overlooked. It’s like building a house on sand. It may look fine initially, but as time passes, the structure will collapse.

Professor Pieter Abbeel is Director of the Berkeley Robot Learning Lab and Co-Director of the Berkeley Artificial Intelligence (BAIR) Lab. He has founded three companies: Covariant (AI for intelligent automation of warehouses and factories), Gradescope (AI to help teachers with grading homework and exams), and Berkeley Open Arms (low-cost 7-dof robot arms). He also hosts the podcast The Robot Brains.

Welcome to the VentureBeat community!

Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.

Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!

More