Drones and other autonomous robots require mobile and efficient solutions to real-life issues, from mundane package transportation to urgent search and rescue missions. Using machine learning and a vector-based navigation system inspired by insects, agents could navigate to key locations without relying on a GPS — becoming truly autonomous.

Robots could learn to navigate independently to wildfires based on environmental sensory cues, using information from cameras and other sensors. Since vectors are represented in a geocentric context, multiple agents could communicate locations with each other, which could, for example, speed up efforts to perform rescues and put out fires.

Such flexibility and speed of coordination would largely improve the success and efficiency of rescue missions during natural disasters — and save lives. Learning from nature will help future efforts in autonomous, long-distance navigation through complex real-world environments.

What we can learn from ants

Ants and bees are excellent navigators. For example, Sahara Desert ants survive under harsh conditions by foraging for food in temperatures higher than 60° C (140° F). In this extreme environment, they cannot, like other ants, use pheromones to track their long-distance journeys back to their nests. Instead they make a biological computation called path integration. This mental path math involves integrating a skylight compass (they see different color patterns in the sky than we humans do) and odometric stimuli to estimate their current position.

Path integration is not only used to return safely to their nests, but also helps in learning so-called vector memories. Such memories have shown to be sufficient to produce goal-directed navigation in ants and bees. Since these abilities allow the insects to perform navigation over many hundreds of meters (ants) to multiple kilometers (bees), such control systems could have great potential in applications for artificial agents.

Inspired by this idea, I began collaborating with Dennis Goldschmidt from the Champalimaud Centre for the Unknown in Portugal and Dr. Poramate Manoonpong from the University of Southern Denmark. In our paper, recently published in Frontiers in Neurorobotics, we explain how to develop a neurocomputational model of goal-directed navigation for autonomous agents. Our simulated robots are able to learn and store vectorial memories based on path integration.

Results for vector-guided navigation generated by the proposed model. A) After two trials of random exploration, the agent learns to find the feeder. B) Synaptic strengths of the global vector (GV) array changes due to learning over time during the five trials. The estimated angle (cyan-colored line) is output from the GV array. The actual angle to the feeder is indicated by the red dashed line. Below are exploration rates and food reward signals with respect to time. C) Goal-directed navigation in an environment with randomly placed feeders. As the exploration rate (green line) decreases to zero, the agent learns to navigate towards a nearby feeder. D) Route formation using local vector learning.

From bees to bots

Our virtual agent’s mathematical model consists of a biologically plausible neural network that learns rules which represent vectors as an activity pattern across a circular array of neurons. The path integration mechanism receives input from a compass sensor and speedometer on the agent. The integration of these inputs within the network computes an activity pattern, which represents a vector of the agent’s current location.

Complex adaptive behavior with learning on simulated AMOS-II walking robot. (Dasgupta et al., 2015)

Short video implementing the neurocomputational model of goal-directed navigation on a simulated hexapod. The agent uses path integration to learn vector memories that can guide it across long distances, reach a target or navigate around obstacles, and safely return back to home location without any global position information.

All the components of the neural model are run on a standard laptop with two types of artificial agents created in simulation. The initial simulations consisted of situated point agents mimicking the behavior of insects within a two-dimensional environment. Then we tested path integration and navigation, learned using the model, on a simulated complex hexapod robot with 19-degrees-of-freedom limb range of motion, within a three-dimensional physics-based simulator. This proved the efficacy of the mathematical model and its implementation on complex walking machines. In collaboration with Dr. Manoonpong’s team at the University of Southern Denmark, this navigational system will be implemented on the insect-inspired physical robot AMOS-II.

Open-source multi sensori-motor robotic platform AMOS II (Advanced Mobility Sensor Driven-Walking Device version II) By courtesy of Dr. Poramate Manoonpong (SDU) and Bernstein Center for Computational Neuroscience, Göttingen

Learning with rewards

Our agent also uses a reward-based learning rule that reinforces vector memories acquired from path integration. In insects, such rewards would be food locations. The model in our study not only reproduces goal-directed navigation and route formation in the agent, but can also predict the navigational behavior of insects. More importantly, it offers a simple computational framework for decision-making applications in real-world navigating agents.

In the simulation, rewards are provided as positive signals that the agent learns to associate with sensory cues. In the real world, such navigation will be applied to mobile robots by reinforcing certain locations — say, a doorstep to deliver packages — based on visual or other sensory cues. All the while, the robot can keep continuous track of its home location using our path integration mechanism, even in the absence of global positioning systems.

Closing the loop

Furthering this research, we are building a novel closed-loop learning framework, inspired by biological systems, that integrates different types of learning mechanisms for agent decision-making. The framework will allow for reinforcement learning to work in a closed loop with other learning mechanisms like supervised learning, inspired by learning in the human brain.

A closed-loop network that allows feedback between different learning mechanisms — i.e., reinforcement learning, imitation learning, and unsupervised learning — makes progress more efficiently, allowing autonomous agents to learn new tasks much more quickly. Most of the current work in deep learning or deep reinforcement learning have focused on using and optimizing just one learning mechanism. However, a brain-inspired closed-loop approach may provide a much more efficient and scalable learning framework.

Dr. Sakyasingha Dasgupta is the principal scientist leading the research team of Tokyo-based startup LeapMind, Inc.

A fuller version of this story originally appeared on Medium. Copyright 2017.