Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.


While discussions about AI often center around the technology’s commercial potential, increasingly, researchers are investigating ways that AI can be harnessed to drive societal change. Among others, Facebook chief AI scientist Yann LeCun and Google Brain cofounder Andrew Ng have argued that mitigating climate change and promoting energy efficiency are preeminent challenges for AI researchers.

Along this vein, researchers at the Montreal AI Ethics Institute have proposed a framework designed to quantify the social impact of AI through techniques like compute-efficient machine learning. An IBM project delivers farm cultivation recommendations from digital farm “twins” that simulate the future soil conditions of real-world crops. Other researchers are using AI-generated images to help visualize climate change, and nonprofits like WattTime are working to reduce households’ carbon footprint by automating when electric vehicles, thermostats, and appliances are active based on where renewable energy is available.

Seeking to spur further explorations in the field, a group at the Stanford Sustainability and Artificial Intelligence Lab this week released (to coincide with NeurIPS 2021) a benchmark dataset called SustainBench for monitoring sustainable development goals (SDGs) including agriculture, health, and education using machine learning. As the coauthors told VentureBeat in an interview, the goal is threefold: (1) lower the barriers to entry for researchers to contribute to achieving SDGs; (2) provide metrics for evaluating SDG-tracking algorithms, and (3) encourage the development of methods where improved AI model performance facilitates progress towards SDGs.

“SustainBench was a natural outcome of the many research projects that [we’ve] worked on over the past half-decade. The driving force behind these research projects was always the lack of large, high-quality labeled datasets for measuring progress toward the United Nations Sustainable Development Goals (UN SDGs), which forced us to come up with creative machine learning techniques to overcome the label sparsity,” the coauthors said. “[H]aving accumulated enough experience working with datasets from diverse sustainability domains, we realized earlier this year that we were well-positioned to share our expertise on the data side of the machine learning equation … Indeed, we are not aware of any prior sustainability-focused datasets with similar size and scale of SustainBench.”

Motivation

Progress toward SDGs has historically been measured through civil registrations, population-based surveys, and government-orchestrated censuses. However, data collection is expensive, leading many countries to go decades between taking measurements on SDG indicators. It’s estimated that only half of SDG indicators have regular data from more than half of the world’s countries, limiting the ability of the international community to track progress toward the SDGs.

“For example, early on during the COVID-19 pandemic, many developing countries implemented their own cash transfer programs, similar to the direct cash payments from the IRS in the United States. However … data records on household wealth and income in developing countries are often unreliable or unavailable,” the coauthors said.

Innovations in AI have shown promise in helping to plug the data gaps, however. Data from satellite imagery, social media posts, and smartphones can be used to train models to predict things like poverty, annual land cover, deforestation, agricultural cropping patterns, crop yields, and even the location and impact of natural disasters. For example, the governments of Bangladesh, Mozambique, Nigeria, Togo, and Uganda used machine learning-based poverty and cropland maps to direct economic aid to their most vulnerable populations during the pandemic.

But progress has been hindered by challenges, including a lack of expertise and dearth of data for low-income countries. With SustainBench, the Stanford researchers — along with contributors at Caltech, UC Berkeley, and Carnegie Mellon — hope to provide a starting ground for training machine learning models that can help measure SDG indicators and have a wide range of applications for real-world tasks.

SustainBench contains a suite of 15 benchmark tasks across seven SDGs taken from the United Nations, including good health and well-being, quality education, and clean water and sanitation. Beyond this, SustainBench offers tasks for machine learning challenges that cover 119 countries, each designed to promote the development of SDG measurement methods on real-world data.

The coauthors caution that AI-based approaches should supplement, rather than replace, ground-based data collection. They point out that ground truth data are necessary for training models in the first place, and that even the best sensor data can only capture some — but not all — of the outcomes of interest. But AI, they still believe, can be helpful for measuring sustainability indicators in regions where ground truth measurements are scarce or unavailable.

“[SDG] indicators have tremendous implications for policymakers, yet ‘key data are scarce, and often scarcest in places where they are most needed,’ as several of our team members wrote in a recent Science review article. By using abundant, cheap, and frequently updated sensor data as inputs, AI can help plug these data gaps. Such input data sources include publicly available satellite images, crowdsourced street-level images, Wikipedia entries, and mobile phone records, among others,” the coauthors said.

Future work

In the short term, the coauthors say that they’re focused on raising awareness of SustainBench within the machine learning community. Future versions of SustainBench are in the planning stages, potentially with additional datasets and AI benchmarks.

“Two technical challenges stand out to us. The first challenge is to develop machine learning models that can reason about multi-modal data. Most AI models today tend to work with single data modalities (e.g., only satellite images, or only text), but sensor data often comes in many forms … The second challenge is to design models that can take advantage of the large amount of unlabeled sensor data, compared to sparse ground truth labels,” the coauthors said. “On the non-technical side, we also see a challenge in getting the broader machine learning community to focus more efforts on sustainability applications … As we alluded to earlier, we hope SustainBench makes it easier for machine learning researchers to recognize the role and challenges of machine learning for sustainability applications.”

For AI coverage, send news tips to Kyle Wiggers — and be sure to subscribe to the AI Weekly newsletter and bookmark our AI channel, The Machine.

Thanks for reading,

Kyle Wiggers

AI Staff Writer

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.