Facebook's AI streamlines sentences while preserving meaning

Simplifying text's grammar and structure is a useful skill most of us acquire in school, but AI typically has a tougher go of it, owing to a lack of linguistic knowledge. That said, scientists at Facebook AI Research and Inria are progressing toward a simplification model dubbed ACCESS (AudienCe-CEntric Sentence Simplification), which they claim enables customization of text length, amount of paraphrasing, lexical complexity, syntactic complexity, and other parameters while preserving coherency.

"Text simplification can be beneficial for people with cognitive disabilities, such as aphasia, dyslexia, and autism, but also for second language learners and people with low literacy," wrote the researchers in a preprint paper detailing their work. "The type of simplification needed for each of these audiences is different ... Yet, research in text simplification has been mostly focused on developing models that generate a single generic simplification for a given source text with no possibility to adapt outputs for the needs of various target populations. [We] propose a controllable simplification model that provides explicit ways for users to manipulate and update simplified outputs as they see fit."

To this end, the team tapped seq2seq, a general-purpose encoder-decoder framework that takes data and its context as inputs. The researchers prepended a special token value -- the ratio of a parameter (like length) calculated on the target sentence with respect to its value on the source sentence -- at the beginning of source sentences. Then they conditioned the model on four selected parameters, namely length, paraphrasing, lexical complexity, and syntactic complexity.

For the experiments, the team trained a Transformer model on the Wiki-Large data set, which contains over 296,402 samples of automatically aligned complex-simple sentence pairs from English Wikipedia and Simple English Wikipedia. They evaluated it on a validation and test sets taken from Turkcorpus, where each complex sentence had eight paraphrased sentences (without splitting, oversimplified structures, or reduction of content) created by Amazon Mechanical Turk workers.

The team reports that on SARI, a popular benchmark that compares predicted simplifications with both the source and the target references, ACCESS scored 41.87, a "significant" improvement over the previous state of the art (40.45). And on a metric used for measuring readability that doesn't account for grammaticality and meaning preservation (FKGL), it scored third to best with 7.22.

"We confirmed through an analysis that each parameter has the desired effect on the generated simplifications," wrote the researchers, who believe their method helps paves the way for adapting text simplification to audiences with different needs. "This paper showed that explicitly conditioning [the models] ... on parameters such as length, paraphrasing, lexical complexity, or syntactic complexity increases their performance significantly for sentence simplification."

More