Microsoft's AI improves text summarization performance by paying closer attention to the beginning

A newsy feature from the New York Times is bound to have a different tone than the average Reddit post. Indeed, the diversity of writing styles and grammatical structures makes the task of automatic text summarization highly challenging. That's why researchers from Pittsburgh and Microsoft Researcher's Future Social Experiences (FUSE) lab, which focuses on real-time and media-rich experiences, developed an AI system that pays close attention to the beginning of documents it's summarizing. The team says this approach improved experimental performance, particularly in the case of web forum content, as well as with more generic forms of textual data.

This research follows the publication of a Microsoft Research study detailing a "flexible" AI system capable of reasoning about relationships in "weakly structured" text. The coauthors claims it could outperform conventional natural language processing models on a range of text summarization tasks.

As the researchers point out, forum discussion threads usually start with posts or comments seeking knowledge or help, with subsequent comments tending to respond to the original post by providing additional information or opinions. Often, this initial text contains important topical information that could be useful in summarization.

The proposed AI benefits from this dependency between original posts and replies, but it also tries to weed out irrelevant or superficial replies to ensure they don't degrade summarization.

The researchers prepped and evaluated their model on two summarization corpora: one from a TripAdvisor forum containing 700 threads (of which 500 were used for training and 200 were used for validation and testing) and another containing 532 Microsoft Word documents across subjects (of which 266, 138, and 128 were used for training, validation, and testing, respectively). The AI ingested keywords extracted from each sentence, as well as whole-document sentence-level representations, enabling it to learn which sentences were salient in text documents and use these sentences to generate summarizations.

In the future, the researchers plan to incorporate more generic data sets into the training and testing phases to further verify their approach. They also plan to vary the number of sentences ingested by the model from the initial part of generic documents.

"We make use of the tendency of introducing important information early in the text by attending to the first few sentences in generic textual data," they wrote in a paper detailing their work. "Evaluations demonstrated that attending to introductory sentences using bidirectional attention improves the performance of extractive summarization models [even when] applied to more generic form[s] of textual data."

More