Machine learning algorithms abound in biodiversity research, but sometimes without the proper attribution or oversight. In an effort to raise the academic bar, Google says it will release an AI workflow for institutions, developed in collaboration with Global Biodiversity Information Facility (GBIF), iNaturalist, and Visipedia. Researchers at the tech giant say the workflow will support data aggregation and collaboration across teams while ensuring corpora follow standardized licensing terms, use compatible file formats, and provide fair and sufficient data coverage for the task at hand.

“The promise of machine learning for species identification is coming to fruition, revealing its transformative potential in biodiversity research,” wrote visiting faculty Serge Belongie and Google Research engineering director Hartwig Adam in a blog post published to coincide with the Biodiversity Next conference in Leiden, Netherlands. “International workshops … feature competitions to develop top performing classification algorithms for everything from wildlife camera trap images to pressed flower specimens on herbarium sheets. The encouraging results that have emerged from these competitions inspired us to expand the availability of biodiversity datasets and ML models from workshop-scale to global-scale.”

The workflow will comprise two parts: data sets packaged by GBIF and models trained and published by Google and Visipedia. The former will be vetted to guarantee they met baseline license and citation requirements, and they’ll be issued through a digital object identifier (a persistent identifier or handle used to identify objects uniquely) and linked through the International Organization for Standardization’s DOI citation graph. Meanwhile, the latter will be available with documentation on TensorFlow Hub, Google’s public repository of machine learning models, where they’ll be accompanied by information about provenance, architecture, license information, and more along with interactive model demonstrations that run on user-supplied images.

Google AI workflow

Above: Illustration of live, interactive Mushroom Recognizer, powered by a publicly available model trained on a fungi dataset provided by the Danish Mycological Society.

Image Credit: Google

“Central to the tradition of scholarly research are the conventions of citation and attribution, and it follows that as ML extends its reach into the life sciences, it should bring with it appropriate counterparts to those conventions,” said Belongie and Adam. “More broadly, there is a growing awareness of the importance of ethics, fairness, and transparency within the ML community … We look forward to engaging with institutions around the globe to enable new and innovative uses of [machine learning] for biodiversity.”