Gmail is now blocking 100 million more spam emails a day, thanks to TensorFlow

Did you know that the average worker devotes only 5 percent of their time to creative brainstorming? It's true, according to a Google study. Tasks like tracking, research and analysis, finishing touches, formatting, and integrating feedback -- much of it conducted via email -- crowd out ideation. In fact, the average person spends a whopping 11 hours a week sifting through work messages, and the approximately 14.5 million spam emails sent (and received) each day only exacerbate the distractions.

Fortunately for the 1.5 billion people who use Gmail every month and employees of the 5 million businesses that Gmail through G Suite, though, good news: Google's looking out for you. As early as 2015, it's claimed to block 99.9 percent of spam, phishing, and malware from reaching accounts with the help of artificial neural networks (layers of math functions loosely modeled after neurons in the brain), all while ensuring the share of legitimate mail that inadvertently ends up in spam folders stays below 0.05 percent. (That former metric is up from 99 percent of spam email in 2012.) And now Google says that, thanks in part to "new protections" powered by its open source machine learning framework TensorFlow, it's blocking around 100 million additional spam messages every day.

The company previously said it protects users from 10 million spam and malicious emails every minute.

"We've utilized [machine learning] in the past, [and] we['ve] also [had] a number of rules and rules-based protections," Neil Kumaran, product manager at Gogole's Counter Abuse Technology division, told VentureBeat in a phone interview. "What's interesting about TensorFlow, though, is [that] it's good at detecting some of the incremental spam that's left."

It's not that Gmail engineers haven't been leveraging TensorFlow to improve spam detection -- far from it. Google noted the team's migration to the framework in a blog post dated May 2017. Rather, this latest milestone was achieved through broader adoption of TensorFlow and a larger set of spam classifications "at scale," according to Kumaran.

"Gmail has been protecting ... users [from] spam, phishing, and malware for a while now, and [we've] utilize[d] a variety of techniques," he added.

So, what sort of incremental spam has Gmail gotten better at deflecting? Mostly image-based spam. In image spam, which made up 8 percent of all spam traffic in 2011, textual messages are embedded into photos attached to emails -- the sorts of photos that email clients display directly to users. It's an attempt to circumvent text-analyzing spam filters and optical character recognition (OCR) tools, and some of the schemes are quite sophisticated. Scammers apply obfuscation that prevents the embedded text from being read by OCR tools, or attempt to mislead signature-based detection algorithms with nonsense words.

Kumaran said that TensorFlow has afforded the Gmail team flexibility in experimentation, which in turn has allowed to more quickly deploy new methods that combat image spam and other novel message types.

"[Spammers are constantly using] new domains -- for example, [sending] some legitimate email out and then starting to send spam messages. ... Historically our staff [has been] tapping TensorFlow as a way to catching those," Kumaran said. "It's provided us a better toolset, [and] allow[ed] us to run experiments much more quickly."

Gmail and AI

Google has long relied on users to signal which emailed messages are the real deal -- and which aren't. And it's used that data to train its AI systems.

The "report spam" and "not spam" buttons in Gmail aren't just for show -- they inform mail-filtering machine learning algorithms, improving their ability to detect unwanted emails and reflect individual preferences. (The latter bit is why some users get weekly newsletters in their inboxes while others don't.) And they complement Google's phishing-detecting models, which in some cases delay messages to perform detailed analysis as its algorithms update in real time.

"Everybody's definition of spam may not be the same -- what may be spam for one person may not be spam to another," Kumaran said. "Personalization is possible with [machine learning] -- [it does] a really good job of taking into account the users particular preferences."

Some might be quick to accuse Google of painting bulk emailers with a broad brush, but it's offered the occasional olive branch. In 2015, for instance, it released Postmaster Tools, a diagnostic suite that exposes delivery errors, spam reports, and other potentially useful analytics to companies that send bulk emails to Gmail users.

The Mountain View company claims that, at the end of the day, it's trying to keep users and organizations safe from cybercriminals. A 2015 McAfee survey found that 97 percent of people couldn't correctly identify phishing scam emails -- emails that cost the average mid-sized company about $1.6 million, if they're successful.

If all goes according to plan, spam-detecting algorithms won't be the only thing TensorFlow is advancing. Kumaran said that the team's already testing improved phishing- and malware-filtering models that it hopes to detail in the future. "Hopefully, we will have similar blog posts in the future about that, too," Kumaran said.

Beyond the message-filtering domain, AI is creeping into facets of Gmail new and old. In 2016, Google introduced Smart Reply, a feature that uses machine learning to suggest replies to email, which it's continued to refine and improve as it brings it to platforms like Hangouts Chat. (According to Google, Smart Reply processes hundreds of millions of messages daily and drives more than 10 percent of email replies on mobile.) More recently, Google rolled out Smart Compose, which uses AI to autocomplete words and phrases, and "nudges," a machine learning-driven feature that delivers email follow-up reminders. Those are in addition to high-priority notifications, a setting that restricts email notifications to "important messages," and newsletter unsubscribe suggestions.