Amazon’s AWS ReInvent summit in Las Vegas might not kick off until later this month, but the Seattle company’s cloud computing subsidiary is not wasting time. Today it detailed improvements heading to Comprehend, its natural language processing service that extracts entities — namely phrases, places, people’s names, brands, events, and sentiment — from unstructured text.
New no-code customization tools — Custom Entities and Custom Classification — in Comprehend will allow developers to “identify natural language terms and classify text which is specialized to their team, business, or industry,” Dr. Matt Wood, general manager of learning and artificial intelligence (AI) at Amazon Web Services, wrote in a blog post.
“Many customers tell us they have a surplus of data — specifically, data comprising unstructured, natural language,” he said. “You likely won’t have to look far inside your own organization before you find a treasure trove of potential information … Helping find the needle inside this proverbial haystack is something machine learning is particularly good at.”
To that end, Custom Entities allows customers to “teach” Comprehend terms specific to a vertical or domain. From a small sampling of examples — say, a list of policy numbers and the text in which they’re used — Custom Entities can train a bespoke model capable of identifying target text in any given snippet.
A complementary new feature — Custom Classification — enables developers to group documents into named categories. With as few as 50 examples, Custom Classification can train a model that’s able to categorize emails, social media posts, analyst reports, and other documents, or classify them from their content.
Both Custom Entities and Custom Classification are generally available. LexisNexis is already using the Custom Entities feature to extract legal entities from more than 200 million documents at greater than 92 percent accuracy, Dr. Wood wrote.
“Since the earliest days of AWS, our goal has been to take technology which is traditionally only within reach of large, well-funded organizations and to put it in the hands of all developers,” he wrote. “Under the hood, Comprehend will do the heavy lifting to build, train, and host the customized machine learning models, and make those models available through a private API … These new capabilities for Comprehend are a perfect reflection of this spirit; we’re excited to see what you build with them.”
The new and improved Comprehend follows on the heels of the opening of AWS’ second set of high-security GovCloud datacenters in the U.S. and Amazon’s announcement that it plans to open datacenters in Italy in 2020. Earlier this month, AWS made Translate, Transcribe, and Comprehend services HIPAA-eligible.