Careful what you write: How text mining could hurt your business

Tesla Motors experienced its second highest one-minute trading volume of all time earlier this month after computers misinterpreted the company's April Fools' Day joke as a serious press release.

This reaction to the company's joke -- that it was releasing a Tesla Watch that "doesn’t just tell the time, it also tells the date" -- illustrates that we live in an odd transition time for unstructured data analysis and artificial intelligence (AI). Today's AI systems are experts in narrow domains, helping with disease diagnosis, building new recipes, recognizing brands in pictures, keeping order in a moderated forum, and, indeed, trading stocks.

But unstructured data analysis, which uses many of the same techniques as AI, is still a relatively unsophisticated field that can deliver unsophisticated results.

Because of the massive amount of content being generated, we have no choice but to employ increasingly automated systems to analyze and make decisions for us. In the case of high-frequency trading, milliseconds are eons. Speed matters. Whoever is closest and fastest wins.

With AIs being really smart but very, very narrow, their blind spots can be profound and the interactions between them complex and somewhat unpredictable. They are still lacking the contextual knowledge that a nine-year-old is carrying around -- namely, that stuff announced on April Fools' Day should not be trusted.

Related issues: There will be company names that confuse text mining systems. There will be deliberate attacks (or “optimizations”) that leverage the knowledge that a machine is doing the analysis on the other end. Just look at the whole search engine optimization market. The same sort of optimization is already happening with press releases and web content – knowing that machines are going to be making the instant call as to whether your stock is a buy or a sell. There will be new uses for old words (“sick!”) that will completely reverse their meanings.

And what happens when you start getting feedback between multiple systems?

Is it beyond the realm of possibility that these systems can get into a pathological feedback loop? It’s probably already happened at some level: Automated text generation racing to keep up with rapidly fluctuating conditions caused by automated trading based on text mining of automatically generated text.

Strange loops, indeed.

This debate is occurring at more than one level. More than 100 nations convened under the auspices of the United Nations in Geneva last week to meet to discuss “Lethal Autonomous Weapons Systems” – (aka “killer robots”). Ethical questions aside, these are systems that are heavily dependent on machine learning and AI in order to make targeting and “fire” decisions.

Still, it’s more likely that you will be hurt by the financial robots and not the literally gun-carrying ones. At least in the near future. First, and foremost, remember that every piece of content your company puts out is going to be read and interpreted by a machine. We haven’t really gotten to the promise of the “Semantic Web” where everything has really nice, clean, machine readable metadata associated with it for a magic experience, so those machines are going to be working on the raw text they are handed.

We are likely 5 to 10 years from something resembling a complete contextual understanding -- one that has the niceties in it to understand humor and sarcasm, and that trees grow up (not down).

The following are a few defensive measures you can take to protect yourself and your company:

1. Write simply when writing about your company, product, and financial results. Short, declarative sentences are not only more understandable; they’re less likely to be misinterpreted. Feel free to be really flowery in your blog posts, but for press releases and filings, simpler is more likely to be interpreted correctly.

2. If you’re doing media monitoring, run your text through the media monitoring system to see how it’s going to be interpreted. Sometimes that clever phrase you came up with is going to befuddle the natural language processing, and that can reflect negatively on you.

3. Monitor market response. If you are publicly traded, keep your announcements well spaced so that you can understand the market effect of the announcement – particularly in the initial few seconds where humans haven’t had time to react and it’s all the machines. Even if you aren’t publicly traded, you can still get a feel for what is happening by monitoring social media – more and more of social media is going to be automatically generated from snippets of news. Was the right snippet pulled out? What were the very first tweets (again, in those precious seconds between announcement and when a human can respond)?

The efficiencies gained through AI are too great to be ignored, but the systems simply don’t have all the understanding necessary to prevent some of these problems and will interact in unpredictable ways. There will be many more “newsworthy” events that send more than just a little ripple through stock prices. Huge strides are being made to teach computers about the world and how things really work (or don’t, as the case may be), but expect to be reading a lot more stories about AI gone awry.

Jeff Catlin is CEO of Lexalytics. He has over 15 years of experience in the fields of search, classification, and text analytics products and services and has held technical, managerial, and senior management positions within a variety of companies, including Thomson Financial and Sovereign Hill Software. Prior to the formation of Lexalytics, Jeff acted as the General Manager for the unstructured data group of LightSpeed Software, where he was responsible for sales, marketing, and development efforts for the Knowledge Appliance and iFocus products. Prior to joining LightSpeed, he was co-owner of PleasantStreet Technologies, which produced a news-filtering product.