We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
The first job for many artificial intelligence (AI) algorithms is to examine the data and find the best classification. An autonomous car, for example, may take an image of a street sign; the classification algorithm must interpret the street sign by reading any words and comparing it to a list of known shapes and sizes. A phone must listen to a sound and determine whether it is one of its wake-up commands (“Alexa,” “Siri,” “Hey Google”).
The job of classification is sometimes the ultimate goal of an algorithm. Many data scientists use AI algorithms to preprocess their data and assign categories. Simply observing the world and recording what is happening is often the main job. Security cameras, for example, are now programmed to detect certain activity that might be suspicious.
In many cases, the classification is just the first step of a larger algorithm. The autonomous car will use the classification of a street sign to make decisions about stopping or turning. A smart vacuum cleaner may watch for pets or children, and it’ll turn off or shut down if one is detected.
What are the types of classification algorithms used in artificial intelligence?
There is a wide range of algorithms that vary between general approaches able to train themselves to answer any type of question and also focused applications that work on particular domains. For example, optical character recognition algorithms are used to convert paper scans into digital documents by classifying each letter in the image.
Other algorithms are designed to work with numerical data. They may divide the range of potential answers into sections representing each possible potential answer. A simple algorithm for classifying pets as either dogs or hamsters may be successful, examining the weight alone. Any pet weighing more than one pound would be classified as a dog and anyone weighing less than a pound would be classified as a hamster.
Other algorithms are more elaborate and rely upon multi-stage models with elaborate feedback loops. Some machine learning algorithms simulate networks of neurons and they often have thousands, millions or even billions of simulated neurons in them. Each simulated neuron is tuned individually to react to the data and produce an answer. These answers from individual neurons are often fed into another stage of simulated neurons, and the entire network produces the classification as the individual answers flow through the network.
How are the classification algorithms trained?
Some simple models for the classification can be trained or programmed by a human who understands the domain. The example above of the algorithm that can determine whether a pet is a dog or a hamster is very simple and the human’s domain knowledge is easy to transfer to the model.
However, most machine learning algorithms aren’t as simple and training them requires running another algorithm. It is common for machine learning scientists to create a training subset of the data. This is fed into the training algorithm, with searches for the best parameters and settings for the parts of the model. In our simple example of distinguishing between dogs and hamsters, the threshold of one pound is the only parameter in that model. In practice, many machine learning algorithms set millions or even billions of parameters in the process of training.
A common step in the process is to set aside some subset of the initial training dataset to evaluate the quality of the results. This data is kept separate from the training process as a control group. When the model is tested on the segregated data, there’s no danger that some unforeseen bias crept into the model.
In addition, some projects require careful pre-classification and data cleansing, which is sometimes called “embedding.” This standardizes the data and introduces a simple structure that can simplify the process. Some numbers, for instance, may be rounded off. Some words may be converted to all capital letters. Occasionally, a separate classification algorithm is used to perform this step.
What are some of the best-known classification algorithms?
The classification algorithms used in AI are a mixture of statistical analysis and algebra, arranged in flowcharts and decision trees. Some approaches predate the idea of creating machine intelligence, emerging from a field of statistics, calculus and numerical analysis.
Many artificial intelligence models use a combination of different approaches and algorithms. Indeed, the choice of algorithm can be a bit of an art. Scientists have a feel for which approaches may work best and they may try numerous combinations until they find a predictive solution.
Some of the best known approaches are:
- Simple regression: Several good techniques can fit a line or a polynomial to a set of data points. Minimizing the square of the distance is a common technique. Once this line is drawn, a threshold may be set and the possible outcomes from classification are mapped to portions of the line.
- Logistic regression: This also uses curve fitting techniques but with more complex curves, often sigmoid functions. The large jump in the sigmoid can be adjusted to provide a good threshold between the classification options.
- Bayesian: Another option is to use bell curves, often called Bayesian functions, to match the data. This works well for clusters. Several bell curves can fit several different clusters and the best threshold can be set by their intersections.
- Support vector machines: This is similar to fitting a line but extends it into multiple dimensions. A plane or collection of planes is positioned to maximize the distance from all the points. These planes become the threshold separating the space.
- Decision tree: Some problems are complex enough that a single regression or threshold isn’t effective. A decision tree creates a flowchart or tree with multiple decisions at each step. In many cases, different variables are used at each step. The process is best for complex datasets where different variables behave very differently, such as when some variables are Boolean and others numerical.
- Random forest: Finding the best collection of decisions for the best tree can be difficult because the possible options increase quickly with the complexity of the data set. The random forest builds many potential trees and tests them all.
- Nearest neighbor: Instead of cutting up a data set with lines or planes, the nearest neighbor approach looks for definitive points in the space. New data points are classified by finding the nearest definitive point in the space. In some cases, the algorithms find a set of weights for the various data fields to adjust how the distance is calculated.
- Neural networks: These are more elaborate AI algorithms that simulate collections of neurons that are arranged in a network. Each neuron can make a simple decision based upon its inputs. The decisions flow through the network until a final classification is made.
How are the major companies attacking classification systems with artificial intelligence?
All of the major cloud companies maintain strong programs in developing and marketing artificial intelligence applications. Each can easily tackle classification problems using their built-in algorithms. Helping customers sort through and label data is one of the first and best applications for their AI tools.
Amazon’s SageMaker, for example, supports many of the best classification algorithms, including nearest neighbor and regression. Its documentation includes a variety of examples for labeling text and image data using all the possible algorithms. The models can also be deployed with many products, such as DeepLens, a fully programmable video camera that can handle some classification problems internally.
Google’s AI tools like VertexAI can all be applied directly to labeling data. The AutoML tool includes a number of predefined and automated procedures for classifying image or textual data. There are also several specialized tools and APIs designed for some of the most important use cases. The Cloud Data Loss Prevention tool is optimized for detecting sensitive personal information and then obscuring it. The Cloud Natural Language API has several pretrained models for tasks like analyzing sentiment or classifying content.
Microsoft’s Azure offers a wide range of tools that start with supporting basic experimentation and end with pre-built applications for important common tasks. The early work is supported with Jupyter notebooks, which have a drag-and-drop interface. The Azure Applied AI Services have tools that optimize jobs like form recognition and digitization, video analysis for jobs like improving safety through surveillance and the Metrics Analyzer for tracking anomalies in log files.
IBM’s products support classification through data science platforms like SPSS and pure AI algorithms. After basic experimentation and exploration, IBM also supports a number of focused tools like the Security Discovery and Classify tool, which can help button down websites and prevent data loss. The Watson Natural Language Understanding tool now includes a feature for creating classification models for text with just a few steps.
Oracle’s product line also includes a wide mixture of tools for basic experimentation, as well as focused systems that tackle particular chores. The Human Capital Management tool in their cloud supports HR departments and offers some AI-based features for classifying employees according to their skills with a Skills Engine and a Skills Nexus. The AI Services have many prebuilt models for analyzing speech, text and imagery.
How are startups approaching artificial intelligence classification?
Startup companies that are solving the problems of classification with artificial intelligence algorithms are also targeting a wide range of markets. Some want to build basic tools that researchers, data scientists and enterprises can deploy. They’re exploring some of the most novel approaches and avenues.
Many companies are also applying the algorithms directly to specific niches or applications. They’re focusing on adapting the approaches to the particular idiosyncrasies of the domain by customizing the data collection, cleansing and embedding into a training set. Many of these don’t sell themselves as artificial intelligence companies, even though much of the value they create comes from the algorithms.
Affirm, for instance, is a fintech firm offering loans to shoppers. Its “Debit+” card offers low 0% APR loans for particular items at sponsoring stores like Lowe’s or Peloton. Other purchases are cleared like normal debit transactions. The AI algorithms work in the background to classify the customers and their purchases.
Clarifai offers a wide range of powerful low-code and no-code classification pipelines for processing text, audio, imagery and video. The Flare Edge tool, for instance, is designed to deploy the classification models to cameras and sensors throughout the internet to speed classification by eliminating the need to ship imagery to a data center.
Symbl AI works with unstructured text and audio to detect conversational topics and classify them according to tone and intent. It integrates with video, telephony, text and streaming sources.
Vectra AI analyzes networks on premises and in data centers to classify threats and identify potential security holes. It watches for dangerous activity like large-scale data exfiltration or encryption to identify the most dangerous threats.
Is there anything that artificial intelligence can’t classify?
Scientists have a wide range of possible classification functions, and they can often find a good match given enough training data. The problems often appear later, when the new data forms a different pattern from the original training data. Even small changes can be significant because sometimes the models are sensitive to tiny shifts in values. Some implementations deliberately use a feedback mechanism to retrain the model over time.
It’s important to note that problems can arise when the data set includes inadvertent patterns. A common difficulty with visual datasets comes from the lighting of the subjects. If a training set is filled with photos taken inside, it may not perform correctly when the new images come from outside or at dusk, for example. Eliminating these subtle differences can be a challenge because humans may not be aware of them. Assembling larger and larger training sets is a common approach to try to ensure that all possible combinations are reflected in the data set.
Other problems can arise when the sensors detect very subtle differences that aren’t obvious to the scientists. For example, human skin often becomes slightly redder during the moments when blood is being pumped through them. Some use a camera alone to sense and measure someone’s pulse. The amount of this flush, though, is rarely enough for human eyes to see. Well-functioning machine learning algorithms can point out subtle differences like this to the human, but sometimes the human discards it as noise.