Study provides insights on GitHub Copilot’s impact on developer productivity

Recently, writing software code has become a promising use case for large language models like GPT-3. At the same time, like many developments in artificial intelligence (AI), there are concerns about how much of the excitement surrounding large language model (LLM)-powered coding is hype.

A new study by GitHub shows that Copilot, its AI code programming assistant, results in a significant increase in developer productivity and happiness. Copilot uses Codex, a specialized version of GPT-3 trained on gigabytes of software code, to autocomplete instructions, generate entire functions, and automate other parts of writing source code.

The study comes one year after GitHub launched the technical preview of its Copilot tool and just a few months after it became publicly available. GitHub's study surveyed more than 2,000 programmers — mostly professional developers and students, who have used Copilot throughout the past year.

While AI-assisted coding is still a new field and needs more research, GitHub’s study provides a good look at what to expect from tools such as Copilot.

Happiness and productivity

According to the GitHub's findings, 60–75% of developers feel “more fulfilled with their job, feel less frustrated when coding, and can focus on more satisfying work” when using its Copilot tool.

Feeling fulfilled and satisfied is a subjective experience, though there are some common traits across what developers reported.

“Knowledge workers in general – and that includes software developers – are intrigued and motivated by problem-solving, and creativity,” GitHub Researcher, Eirini Kalliamvakou, told VentureBeat. “For example, a developer tends to find it more satisfying to think about what design patterns to use, or how to architect a solution that implements a particular logic, drives an outcome, or solves a problem. Compared to that, the rote memorization of syntax or ordering of parameters is considered ‘toil’ that most developers would love to get through quickly.”

Copilot also helps developers “preserve mental effort during repetitive tasks,” 87% of the respondents reported. These are tasks that are frustrating and prone to mistakes, such as writing a SQL migration to update the schema of a database.

“With the exception of database administrators, developers may not write SQL migrations often enough to remember all of the particular SQL syntaxes,” Kalliamvakou said. “But it’s a task that happens often enough for the mental cost of the non-immediate recall to add up. GitHub Copilot removes much of the effort in this scenario.”

Developers tend to “stay in the flow” when using Copilot, the survey found — meanings they spend less time browsing reference documents and online forums like StackOverflow to find solutions. Instead, they prompt Copilot with a text description and get a code that is mostly correct and might need a bit of tweaking.

Faster task completion

More than 90% of the survey’s respondents reported that Copilot helps them complete tasks faster — a finding that was expected. Though, to further measure the speed improvement, GitHub conducted a more thorough experiment, recruiting 95 developers and giving them the task of writing a basic HTTP 1.1 server from scratch in JavaScript.

The participants were divided into two groups, a test group of 45 developers who used Copilot and a control group of 50 developers who did not use the AI assistant. While task completion was not overwhelmingly different between the two groups, completion time was. The Copilot group was able to complete the server code in less than half the time it took for the control group.

While this is an important finding, it would be more interesting to see which types of tasks Copilot helped more with and which areas required more manual coding. Although GitHub did not have figures to share in this regard, Kalliamvakou told VentureBeat that she and her group are “performing more analysis on the code the participants wrote, and plan to share more in the near future.”

Code review and security

It is worth noting that LLMs do not understand and generate code in the same way that humans do, which has raised concerns among researchers. One of these concerns, which is also mentioned in the original Codex paper, is the possibility of AI tools providing erroneous and possibly insecure code suggestions. There are also concerns that over time, developers could start accepting Copilot suggestions without reviewing the code it generates, which can cause vulnerabilities and open new attack vectors.

While GitHub’s new study does not have any information on how Copilot affects secure coding practices, Kalliamvakou said that GitHub continues to work on improving the model and code suggestions. Meanwhile, she stressed that suggestions by GitHub Copilot should be “carefully tested, reviewed, and vetted, like any other code.”

“As GitHub Copilot improves, we will work to exclude insecure or low-quality code from the training set. We think in the long-term, Copilot will be writing more secure code than the average programmer,” Kalliamvakou said.

Kalliamvakou added that GitHub’s studies of Copilot have revealed new areas where AI can help developers, including support for Markdown, better interaction between Copilot and Intellisense suggestions, and using the tool in other parts of the software development lifecycle, including testing and code review.

“Our largest investment is in improving the model, and the quality of suggestions provided by GitHub Copilot since that is the source of the noticeable benefits our users experience,” Kalliamvakou said. “Over time, we expect that GitHub Copilot will be able to remove more of the boilerplate and repetitive coding that developers see as taxing, creating more room for job satisfaction and fulfillment.”