Researchers propose AI agents that learn from human programmers' eye movements

The eye movements we make while engaged in cognitively demanding tasks communicate a lot about us. Metrics like spontaneous blink rate, pupil dilation, and gaze direction shed light on attentional focus and personality, among other things, and might even serve as early biomarkers for neurodegenerative conditions like Parkinson's and Alzheimer's.

Inspired by prior work, including a 2018 study that sought to find correlations between expertise in visual arts and oculomotor movement, researchers at the Nara Institute of Science and Technology in Japan recently applied eye movement research to the software development domain. In a paper published on the preprint server Arxiv.org ("Toward Imitating Visual Attention of Experts in Software Development Tasks"), they describe an AI framework architected to create agents that learn from programmers' eye movements to fix bugs, produce patches, and write comments.

"In the last three decades, we have gained a lot of insight by knowing where a programmer is allocating visual attention, which can be inferred from eye movement data," the researchers wrote. "We have already known that programmers use attention strategies to save time for program comprehension and maintenance. For example, expert programmers tend to automatically concentrate their attention onto informative parts of a program and skim only the relevant keywords in source code. Incorporating gaze-fixation data allows autonomous agents to learn attention strategies that are hard to learn solely from textual characteristics."

The team's proposed method leverages imitation learning, where autonomous agents glean knowledge about complex tasks from human demonstrations. In this case, agents are represented by a model trained using behavioral cloning, an algorithm commonly employed in robotic and natural language processing. Snippets of code and the programming environment are considered as a sequence of tokens or keywords, and the agents are constrained in such a way that they're forced to focus on a particular subset of tokens mimicking an expert programmers' visual attention, taking as input the current state and outputting the desired action. From that point, they're adapted to perform specific tasks.

The agents in question comprise two deep neural networks, or layers of mathematical functions modeled after neurons in the brain: a recurrent neural network (RNN) that's used to encode the global context of a given code snippet, and a task-specified decoder (also an RNN) that uses encoded data from the first RNN to predict next tokens to attend as the action. The researchers acknowledge that their approach requires a large number of demonstrations. They alternatively propose using a generative adversarial network (a two-part neural network consisting of generators that produce samples and discriminators that attempt to distinguish between the generated samples and real-world samples) that would learn expert demonstrations rather than just mimicking the actions. And they suggest complementing visual attention data with electroencephalography (EEG) readings. However, they believe that the framework, if implemented in a production environment, could improve AI agents' performance on a range of software development tasks.

"A baby learns numerous things from the demonstration by parents without any lingual explanations, because demonstrations can represent more than language descriptions. So far, researchers investigated eye movements of programmers and typically converted them into human-understandable numbers and descriptions, [but], this conversion has caused considerable information loss," the paper's authors wrote. "We believe that [imitation learning]-based agents can fully utilize the valuable information sources with less information loss."

More