Why deep learning isn't always the best AI solution

Deep learning is a new method of artificial intelligence that is an active, fast-moving area of research where we can expect advances to become market-ready over the next several years. Unfortunately, market hype has turned deep learning into a buzzword that can contribute to the misconception that other approaches to AI are not relevant. After all, if you are not doing deep learning, surely you must be doing shallow learning, right?

In cybersecurity, we use various techniques, such as statistics, probability theory, and multiple machine learning algorithms (of which deep learning is one example), to look at use cases and the data available, selecting the best math or algorithm for the job. We take data from various sources -- application logs, source code, etc. -- choosing the right algorithms based on our understanding of the dataset and use case. This process is fairly artisanal because we are working with a relatively small dataset and the behaviors we are detecting are often very subtle, such as detecting insider threat from source code audit logs. Deep learning is just another specific technique within AI.

Simply described, deep learning is a class of machine learning algorithms that learns by using a large, many-layered collection of connected processes and exposing these processors to a vast set of examples. Deep learning processing is becoming possible across various industries because we have access to large amounts of compute power and processing units, such as with technologies like cloud and GPUs. With this large dataset at our disposal, research in deep learning techniques is fast and furious. Malware detection is one great example, the focus of several security startups attempting to leverage the large set of malware examples accumulated over many years. Other approaches are applying deep learning to smaller datasets; for example, one area of research involves looking at how much data is needed to train a medical image deep learning system.

Detecting malware using deep learning makes sense because we already have a large dataset that characterizes malware. The same cannot be said for insider threats. We just don't have access yet to enough information from when companies experience these types of attacks. We have anecdotes and sometimes simulated data based on actual events, but anecdotes cannot be used by deep learning networks, and actual log files that correspond to true insider threats are few and far between, although this may change over time. Without large volumes of data on which to base our features, deep learning is simply overkill (or worse, ineffective) for insider threat – at least today.

In the future, the ability for deep processing of security networks to automatically adjust and tune connections with increasing volumes of data will improve the process of learning. In particular, this will allow us to automate and use networks to specialize in certain areas. The networks will learn which portions of the data are more predictable than others, in a way that reduces the dependency on human data scientists to guide the learning process. This "automatic feature learning" is potentially a very big deal for security. With deep learning, the security system can automatically learn by trying billions of combinations and making millions of observations. The potential for getting more accurate results is leading to the excitement that we currently are experiencing as hype.

In the meantime, deep learning systems are very tricky to set up. They are complex and costly, and many so-called hyperparameters are difficult to determine in advance without a lot of experience or experimentation. Training a deep learning model can require several orders of magnitude more compute capacity and cost compared to other, more straightforward machine learning models. For example, a logistic regression model is simple enough to run on one machine for a small dataset, and it remains a very effective approach for many classification tasks today. Progress in hardware acceleration (via GPUs and more recently Google's TPUs) promises to drive the cost per computational unit down. But today, deep learning systems remain the most expensive machine learning method by a wide margin, and that alone may price the approach beyond the range of most use cases.

Thus, deep learning is still just one of many machine learning methods. It is very promising when aimed at a specific class of problems, but it is not a silver bullet. Just because a technology uses deep learning doesn't mean other traditional AI and machine learning approaches are not more valuable or practical. Artificial intelligence is multi-purpose technology we can put to work in security and other industries as well, learning, iterating, and improving as we go.

In security, we know that you don't have to go deep to catch the bad guy. At the end of the day, as long as the good guys win and the bad guys lose, the actual weapon used doesn't matter.

Stephan Jou is the CTO of Interset, a data analytics company.

More