Google Assistant is the humbly named artificial intelligence agent that lives in Google’s new Pixel phone, and it represents the most powerful accumulation of applied data science in history. Google feeds its various intelligence processes, from Deep Mind to Knowledge Graph to Expander, with exabytes of data from its billions of active users. The data comes in a variety of forms, such as:

  • Search requests and responses
  • Text and network connections from Gmail
  • Browser behaviors via Chrome
  • Application actions from Android
  • Images from Street View
  • Driver information from Google Maps and Waze

Some of this data is labeled — say an image with a street location, a greeting in an email with a positive response, a click for a search result, or a connection between Google Plus and your friends. However, most of the data is not labeled in any human-intelligible manner. Despite the lack of explicit labeling, Assistant is able to infer labels that inference gives it the flexibility to act more and more human in its interactions.

How could a data store this vast and varied possibly be integrated into a model that actually predicts something meaningful for an individual person at an individual time, such as what you should have for lunch, or how you should reply to your Mom’s text about Thanksgiving? What kind of intelligence are we talking about here?

There is a precedent: the human baby.

To speak like a human, learn like a human

It shouldn’t be a surprise that as Google and other companies increasingly seek to mimic the human experience in their software interactions, they will eventually start to mimic the human learning experience itself. Bots use words to receive and send information, and if humans are good at one thing, it’s learning words.

Babies acquire incredibly voluminous streams of data through their nascent neural connections and, at first, absolutely none of this is labeled with words, because the baby hasn’t heard words yet. Despite the fact that there are no words, infants learn through other labels such as touch, pain, smell, and color.

Occasionally, as the infant grows, the infant will hear words associated with some objects, and these words might be heard repeatedly over time. The growing human learns to associate “mama” with a particular face, and then later the word “dog” with a particular form. But how does the baby learn about dog the concept? Babies learn concepts, like dog and house and mama, incredibly quickly by matching labels with associations through a process called semi-supervised learning.

Supervised learning occurs when a label is provided for a specific object with specific features, and a rubric is learned to estimate labels for unknown objects based on features. For example, every time a child points to an object and asks “Dog?” it receives a yes or no from an adult, and over time that creates a highly accurate model of what dog means. However, supervised learning can be time and resource intensive, as well as prone to inaccuracy.

Semi-supervised learning

In the human language realm, as in the realm of Pixel’s artificial intelligence, those concrete object-label associations are few and far between. Yet a baby can be given just one example of cute and then perhaps quickly attribute the label to all of the cats in the world. This can be done by putting a high value on the associations between objects so that, when a label is given to one object, a certain amount of attribution is given to all the associated objects.

In very broad terms, this use of direct and indirect associations between features, labels, and objects is what powers semi-supervised learning.

semisupervised

Above: In semi-supervised graph learning, object associations lead to probabilities for labels of every object. The strong link, or “edge,” between the term “cute” and the initial cat leads to a higher probability that the associated object, or “node,” is also cute.

Semi-supervised learning works by keeping track of associations between objects or concepts through a process known as graph modeling. Each data point — that is, each phrase that might be used as an email reply — is associated through a graph model with all of the other phrases based on when they appeared, what the context of the interaction was, what the location was, and any number of connections.

Google’s tool for using graph modeling for semi-supervised learning is known as Expander. It’s described in more detail on the company’s research blog. As an example, in the simplified model above, if there a stronger association between the term “cute” and the first cat, then subsequent connections in that cat’s network will be more likely to be labeled cute. Multiply this association probability system by exabytes and you’ve got the smartest bot yet.

All of this, and Google Assistant is barely in its infancy…