IBM, Harvard develop tool to tackle black box problem in AI translation

In recent years, machine translation has improved immensely thanks to advances in deep learning and neural networks. However, the advantages of neural networks come at the cost of not knowing for sure what goes on inside them, which means it’s hard to troubleshoot their mistakes, such as when they translate “good morning” in Arabic to “attack them” in Hebrew.

Researchers at IBM and Harvard University have developed a new debugging tool to address this issue. Presented at the IEEE Conference on Visual Analytics Science and Technology in Berlin last week, the tool lets creators of deep learning applications visualize the decision-making an AI makes when translating a sequence of words from one language to another.

Called Seq2Seq-Vis, the tool is one of the several efforts that aim to interpret decisions made by deep neural networks. Widely known as the “black box problem," the opacity of neural networks has become one of the serious challenges of the AI industry, especially as deep learning finds its way into more critical domains.

Seq2Seq-Vis is focused on “sequence-to-sequence” models, the AI architecture used in most modern machine translation systems. “Sequence-to-sequence models can learn to transform an arbitrary-length input sequence into an arbitrary-length output sequence,” says Hendrik Strobelt, scientist at IBM Research, adding that aside from language translation, sequence-to-sequence is also used in other fields such as question-answering, summarization of long text and image captioning. “These models are really powerful and state-of-the art in most of these tasks,” he says.

In a nutshell, sequence-to-sequence translation models run a source string through several neural networks to map it to the target language and refine the output to make sure it is grammatically and semantically correct. The introduction of neural networks has improved the results dramatically, but has also made the application more complex.

Visualizing machine translation

Stobelt compares the debugging of traditional language translator software to using a phone book. “Whenever something went wrong, you could look into this book and find out the rule that was producing the error message, and you could fix the rule,” he says. “The problem is, for these highly complex, end-to-end trained networks, you can’t create such a book easily. So we were thinking of what could be the replacement for something like this. And this essentially drove our goal for Seq2Seq-Vis.”

Stobelt showed us how the tool works on its demo website, which has an example of a German-to-English translation gone wrong. The sentence “die längsten reisen fangen an , wenn es auf den straßen dunkel wird .” should be translated to “The longest journeys begin when it gets dark in the streets.” But the AI model has translated it to “the longest travel begins when it gets to the streets.”

Seq2Seq-vis creates a visual representation of the different stages of the sequence-to-sequence translation process. This enables the user to examine the model’s decision process and find where the error is taking place.

Seq2Seq-Vis also shows how each of the words in the input and output sentences map to training examples in the neural networks of the AI model. “The most complicated part of the explanation is how to connect the decisions to the training examples,” Stobelt says. “The training data describes the world of the model. The model doesn’t know more about the world than what was presented by the training data. And so it makes sense to take a look at the training data when debugging a model.”

For instance, by using the visual tools, a user can determine whether the error was due to bad training examples given to the encoder and decoder, the neural networks that classify sentences in the source and destination languages; a misconfiguration in the “attention model,” the component that connects the encoder and decoder networks; or a problem in the “beam search,” the AI model that refines the output of the translation model.

Correcting sequence-to-sequence models

Seq2Seq-Vis is not the only project that tries to explain decisions made by artificial intelligence. Solving the black box problem has become increasingly important to the AI industry and has attracted several academic institutions, large tech companies, and DARPA, the research arm of the Defense Department. IBM researchers also recently proposed a separate initiative to increase transparency in AI using factsheets.

Seq2Seq-Vis needs access to training data and other inner details about the AI model it wants to debug. In contrast, some of the other explainable AI approaches only need access to the outputs of the neural networks to interpret their decisions.

However, while most other approaches focus solely on interpreting AI decisions, Seq2Seq-Vis allows users to apply corrections to their models. “We were able to do both sides. We were able to do the visualization, but we were also able to change the underlying backend,” Stobelt says. This is what Stobelt calls “What-if testing.”

For instance, users can select and correct words in the output sequence, or they can reconfigure the way the attention model maps input and output positions.

But using Seq2Seq-Vis is not for the end user of translation applications. It requires general knowledge of how sequence-to-sequence models work. This makes sense of course, because as Stobelt explains, the tool is aimed at the architects and trainers of AI models.

So who is interested in Seq2Seq-Vis? “We’re currently talking about how we could use it internally at IBM. But the source code is open source, so I can imagine a lot of companies would want to jump on board,” Stobelt said.

Ben Dickson is a software engineer and the founder of TechTalks, a blog that explores the ways technology is solving and creating problems.

Visualizing machine translation

Correcting sequence-to-sequence models

More