Analyzing text on the Internet to measure how positive it is — product reviews on Amazon.com, for example — has become easier and less expensive with tools from AlchemyAPI, Semantria, and other companies.
But finding the text actually worth mining can be a chore in itself.
To do this, Semantria has announced a formal partnership with a company called Diffbot that does the grunt work of finding important passages.
Diffbot uses what it calls “computer vision” technology to scour websites for meaningful information, shedding things like complex surrounding Web code. It then churns out clean text for analysis.
Once Diffbot supplies Semantria with the structured text, Semantria assesses its meaning and tone. Semantria’s goal is to “bring text and sentiment analysis into the hands of a nontechnical person in under 3 minutes and for less than $1,000,” according to founder and chief executive Oleg Rogynskyy.
Semantria has picked up 170 paying customers as well as 10,000 trial users since it started selling its license a year and a half ago, said Rogynskyy. And the company is profitable, he said.
Previously, Rogynskyy was the marketing director at Lexalytics, which has sold text analysis since 2003.
Editor’s note: Our upcoming DataBeat/Data Science Summit, Dec. 4-Dec. 5 in Redwood City, will focus on the most compelling opportunities for businesses in the area of big data analytics and data science. Register today!
To make the Semantria service work quickly, even for text-mining novices, Rogynskyy’s team decided to build a plugin for Microsoft’s popular Excel spreadsheet program. The data in a spreadsheet goes to the cloud for processing, and Semantria sends back analysis in Excel format.
The software can tell, for example, which tweets mentioning the word “windows” are referring to the Microsoft operating system and which ones are talking about real windows, Rogynskyy said. (Microsoft was one of Semantria’s first customers.) Or which tweets that contain the word “coke” concern cocaine and which concern Coca-Cola.
Yankee Candle uses Semantria, Rogynskyy said, to search for information on which smells people associate with specific times of the year. Product managers can figure out, for example, if they should roll out cinnamon-scented candles before Christmas if it looks like that’s what people want.
Semantria’s knowledge of such terms is gleaned from the associations among words found in Wikipedia’s vast collection of information. And it’s also customizable. Other tools might think the word “sucks” has a negative meaning, but for a vacuum-cleaner maker, the word can be positive, and users can adjust for that sort of preference with a couple of clicks, Rogynskyy said.
The decision to partner with Diffbot came as a result of a Semantic Web hackathon in San Francisco that both companies sponsored.
A Diffbot developer built a simple plugin for Google’s Chrome browser that changes the background color of messages on Facebook and Twitter based on sentiment — red for negative, green for positive. The concept won a prize from Semantria, Rogynskyy said. A Diffbot executive was on hand at the hackathon, and Rogynskyy started talking with him about how the two companies could work together.
And indeed, they pair nicely. “Diffbot perceives. We understand context,” Rogynskyy said. We can contextualize the content they give to us. When they pass it on, we give it to the end user, who makes a decision based on cleanly perceived content that has been put into a context by us.”
While Semantria and Diffbot technologies continue to be available separately, they can now be used together. Two large technology companies are already using both, Rogynskyy said.