Salesforce Research wields AI to study medicine, economics, and speech

In 2015, Salesforce researchers working out of a basement under a Palo Alto West Elm furniture store developed the prototype of what would become Einstein, Salesforce's AI platform that powers predictions across its products. As of November, Einstein is serving over 80 billion predictions per day for tens of thousands of businesses and millions of users. But while the technology remains core to Salesforce's business, it's but one of many areas of research under the purview of Salesforce Research, Salesforce's AI R&D division.

Salesforce Research, whose mission is to advance AI techniques that pave the path for new products, applications, and research directions, is an outgrowth of Salesforce CEO Marc Benioff's commitment to AI as a revenue driver. In 2016, when Salesforce first announced Einstein, Benioff characterized AI as "the next platform" on which he predicted companies' future applications and capabilities will be built. The next year, Salesforce released research suggesting that AI's impact through customer relationship management software alone will add over $1 trillion to gross domestic products around the globe and create 800,000 new jobs.

Today, Salesforce Research's work spans a number of domains including computer vision, deep learning, speech, natural language processing, and reinforcement learning. Far from exclusively commercial in nature, the division's projects run the gamut from drones that use AI to spot great white sharks to a system that's able to identify signs of breast cancer from images of tissue. Work continues even as the pandemic forces Salesforce's scientists out of the office for the foreseeable future. Just this past year, Salesforce Research released an environment -- the AI Economist -- for understanding how AI could improve economic design, a tool for testing natural language model robustness, and a framework spelling out the uses, risks, and biases of AI models.

According to Einstein GM Marco Casalaina, the bulk of Salesforce Research's work falls into one of two categories: pure research or applied research. Pure research includes things like the AI Economist, which isn't immediately relevant to tasks that Salesforce or its customers do today. Applied research, on the other hand, has a clear business motivation and use case.

One particularly active subfield of applied research at Salesforce Research is speech. Last spring, as customer service representatives were increasingly ordered to work from home in Manila, the U.S., and elsewhere, some companies began to turn to AI to bridge the resulting gaps in service. Casalaina says that this spurred work on the call center side of Salesforce's business.

"We're doing a lot of work for our customers ... with regard to real-time voice cues. We offer this whole coaching process for customer service representatives that takes place after the call," Casalaina told VentureBeat in a recent interview. "The technology identifies moments that were good or bad but that were coachable in some fashion. We're also working on a number of capabilities like auto escalations and wrap-up, as well as using the contents of calls to prefill fields for you and make your life a little bit easier."

Medicine

AI with health care applications is another research pillar at Salesforce, Richard Socher, former chief scientist at Salesforce, told VentureBeat during a phone interview. Socher, who came to Salesforce following the acquisition of MetaMind in 2016, left Salesforce Research in July 2020 to found search engine startup You.com but remains a scientist emeritus at Salesforce.

"Medical computer vision in particular can be highly impactful," Socher said. "What's interesting is that the human visual system hasn't necessarily developed to be very good at reading x-rays, CT scans, MRI scans in three dimensions, or more importantly images of cells that might indicate a cancer ... The challenge is predicting diagnoses and treatment."

To develop, train, and benchmark predictive health care models, Salesforce Research draws from a proprietary database comprising tens of terabytes of data collected from clinics, hospitals, and other points of care in the U.S. It's anonymized and deidentified, and Andre Esteva, head of medical AI at Salesforce Research, says that Salesforce is committed to adopting privacy-preserving techniques like federated learning that ensure patients a level of anonymity.

"The next frontier is around precision medicine and personalizing therapies," Esteva told VentureBeat. "It's not just what's present in an image or what's present on a patient, but what the patient's future look like, especially if we decide to put them on a therapy. We use AI to take all of the patient's data -- their medical images records, their lifestyle. Decisions are made, and the algorithm predicts if they'll live or die, whether they'll live in a healthy state or unhealthy, and so forth."

Toward this end, in December, Salesforce Research open-sourced ReceptorNet, a machine learning system researchers at the division developed in partnership with clinicians at the University of Southern California's Lawrence J. Ellison Institute for Transformative Medicine of USC. The system, which can determine a critical biomarker for oncologists when deciding on the appropriate treatment for breast cancer patients, achieved 92% accuracy in a study published in the journal Nature Communications.

Typically, breast cancer cells extracted during a biopsy or surgery are tested to see if they contain proteins that act as estrogen or progesterone receptors. When the hormones estrogen and progesterone attach to these receptors, they fuel the cancer growth. But these types of biopsy images are less widely available and require a pathologist to review.

In contrast, ReceptorNet determines hormone receptor status via hematoxylin and eosin (H&E) staining, which takes into account the shape, size, and structure of cells. Salesforce researchers trained the system on several thousand H&E image slides from cancer patients in "dozens" of hospitals around the world.

Research has shown that much of the data used to train algorithms for diagnosing diseases may perpetuate inequalities. Recently, a team of U.K. scientists found that almost all eye disease datasets come from patients in North America, Europe, and China, meaning eye disease-diagnosing algorithms are less certain to work well for racial groups from underrepresented countries. In another study, Stanford University researchers identified most of the U.S. data for studies involving medical uses of AI as coming from California, New York, and Massachusetts.

But Salesforce claims that when it analyzed ReceptorNet for signs of age-, race-, and geography-related bias, it found that there was statically no difference in its performance. The company also says that the algorithm delivered accurate predictions regardless of differences in the preparation of tissue samples.

"On breast cancer classification, we were able to classify some images without a costly and time-intensive staining process," Socher said. "Long story short, this is one of the areas where AI can solve a problem such that it could be helpful in end applications."

In a related project detailed in a paper published last March, scientists at Salesforce Research developed an AI system called ProGen that can generate proteins in a "controllable fashion." Given the desired properties of a protein, like a molecular function or a cellular component, ProGen creates proteins by treating the amino acids making up the protein like words in a paragraph.

The Salesforce Research team behind ProGen trained the model on a dataset of over 280 million protein sequences and associated metadata -- the largest publicly available. The model took each training sample and formulated a guessing game per amino acid. For over a million rounds of training, ProGen attempted to predict the next amino acids from the previous amino acids, and over time, the model learned to generate proteins with sequences it hadn't seen before.

In the future, Salesforce researchers intend to refine ProGen's ability to synthesize novel proteins, whether undiscovered or nonexistent, by honing in on specific protein properties.

Ethics

Salesforce Research's ethical AI work straddles applied and pure research. There's been increased interest in it from customers, according to Casalaina, who says he's had a number of conversations with clients about the ethics of AI over the past six months.

In January, Salesforce researchers released Robustness Gym, which aims to unify a patchwork of libraries to bolster natural language model testing strategies. Robustness Gym provides guidance on how certain variables can help prioritize what evaluations to run. Specifically, it describes the influence of a task via a structure and known prior evaluations, as well as needs such as testing generalization, fairness, or security; and constraints like expertise, compute access, and human resources.

In the study of natural language, robustness testing tends to be the exception rather than the norm. One report found that 60% to 70% of answers given by natural language processing models were embedded somewhere in the benchmark training sets, indicating that the models were usually simply memorizing answers. Another study found that metrics used to benchmark AI and machine learning models tended to be inconsistent, irregularly tracked, and not particularly informative.

In a case study, Salesforce Research had a sentiment modeling team at a "major technology company" measure the bias of their model using Robustness Gym. After testing the system, the modeling team found a performance degradation of up to 18%.

In a more recent study published in July, Salesforce researchers proposed a new way to mitigate gender bias in word embeddings, the word representations used to train AI models to summarize, translate languages, and perform other prediction tasks. Word embeddings capture semantic and syntactic meanings of words and relationships with other words, which is why they're commonly employed in natural language processing. But they have a tendency to inherit gender bias.

Salesforce's proposed solution, Double-Hard Debias, transforms the embedding space into an ostensibly genderless one. It transforms word embeddings into a "subspace" that can be used to find the dimension that encodes frequency information distracting from the encoded genders. Then, it "projects away" the gender component along this dimension to obtain revised embeddings before executing another debiasing action.

To evaluate Double-Hard Debias, the researchers tested it against the WinoBias data set, which consists of pro-gender-stereotype and anti-gender-stereotype sentences. Double-Hard Debias reduced the bias score of embeddings obtained using the GloVe algorithm from 15 (on two types of sentences) to 7.7 while preserving the semantic information.

Future work

Looking ahead, as the pandemic makes clear the benefits of automation, Casalaina expects that this will remain a core area of focus for Salesforce Research. He expects that chatbots built to answer customer questions will become more capable than they currently are, for example, as well as robotic process automation technologies that handle repetitive backroom tasks.

There are numbers to back up Casalaina's assertions. In November, Salesforce reported a 300% increase in Einstein Bot sessions since February of this year, a 680% year-over-year increase compared to 2019. That's in addition to a 700% increase in predictions for agent assistance and service automation and a 300% increase in daily predictions for Einstein for Commerce in Q3 2020. As for Einstein for Marketing Cloud and Einstein for Sales, email and mobile personalization predictions were up 67% in Q3, and there was a 32% increase in converting prospects to buyers using Einstein Lead Scoring.

"The goal is here -- and at Salesforce Research broadly -- is to remove the groundwork for people. A lot of focus is put on the model, the goodness of the model, and all that stuff," Casalaina said. "But that's only 20% of the equation. The 80% part of it is how humans use it."

Medicine

Ethics

Future work

More