TruEra brings automated AI testing to dev workflows for accelerated model deployment

The ability to enable responsible artificial intelligence (AI) with models that help to mitigate bias and work reliably in a consistent manner, is no easy task.

Responsible AI involves both testing and providing AI explainability. Among the vendors that provide testing tools for AI is TruEra, which recently joined the Intel disruptor program to help advance explainable AI. Up to now, organizations have faced a challenge with tools like TruEra's. More often than not, testing has been a point-in-time exercise rather than a continuous, automated approach.

Continuous testing has been accomplished in the software and devops with continuous integration/continuous deployment (CI/CD) pipelines. But that approach has mostly not been available for AI and machine learning (ML) workloads.

"If you find a bug, how do you test, debug and identify the problem and then write a test so that the bug never comes back?" Will Uppington, cofounder and CEO of TruEra, told VentureBeat. "Machine learning developers do not have those tools today, they don't have systematic ways of evaluating and testing their models and then systemically debugging those models."

To that end, TruEra today announced the release of the 2.0 version of its diagnostics platform, TruEra Diagnostics, providing data scientists with an automated test that integrates continuous testing into the AI/ML development workflow.

What an automated test harness brings to ML development

The common development workflow for data scientists is to use a Jupyter notebook to build out models.

Uppington explained that TruEra Diagnostics 2.0 can fit directly into notebooks. By adding several lines of code, a data scientist can create a test that will automatically run every time a model is trained. They can also set up a series of policies to automatically test a system when certain thresholds or constraints are met.

For example, if a model, either in development or in production, generates an error or failure, TruEra Diagnostics 2.0 can provide a link that will enable a developer to debug test results.

The TruEra system's user interface also provides recommendations to help data scientists determine what tests should be run. Uppington explained that the tests are often dependent on the specific model, but there are several common categories. One such category is testing for bias.

Users can run a test against bias metrics and see if the result is above or below whatever threshold the organization has deemed acceptable. If bias is above the threshold, a link provides a drill-down into the reason, including what features of the model are actually causing the bias to occur.

TruEra is also adding the ability to do comparative tests across model iterations to help identify degradation. Model summary dashboards aid in visualizing the tests of one version of a model against the tests of another, displaying individual models' test scores in a comparative table.

"One of the key things that you do with machine learning is you retrain your model," Uppington said. "And one of the things that you want to do when you retrain the models is you want to make sure that you haven't seen any performance degradation in your key quality metrics."

TruEra looks to help support AI regulatory compliance

While organizations need not test models in a continuous and automated way today in order to deploy a model into production, they may find they must in the future.

Emerging AI regulations in the U.S. and in Europe may place new sets of compliance requirements on organizations that use AI. For example, Uppington pointed to a human resources (HR) law in New York City which will soon require organizations that are using machine learning in HR systems to provide an independent audit of their AI's impact and potential bias. The only way organizations will be able to comply, he said, is by constantly testing and validating models.

Continuous testing of models in development will also help to accelerate the deployment of models into production. Uppington said that he has spoken with financial services organizations where it has taken upwards of nine months to develop a model, in part due to extensive validation that occurs before a model is pushed into production. A goal of the new continuous-testing approach is to accelerate development.

Uppington explained that with TruEra, organizations can now more easily create templates for testing tasks, templates that can be used continuously throughout the development process.

"You don't wait until the end to run all your tests and then find out there is a big issue," Uppington said. "You do that continuously throughout the process, and that's what this enables."