What OpenAI and GitHub’s 'AI pair programmer' means for the software industry

Let the OSS Enterprise newsletter guide your open source journey! Sign up here.

OpenAI has once again made the headlines, this time with Copilot, an AI-powered programming tool jointly built with GitHub. Built on top of GPT-3, OpenAI’s famous language model, Copilot is an autocomplete tool that provides relevant (and sometimes lengthy) suggestions as you write code.

Copilot is currently available to select applicants as an extension in Visual Studio Code, the flagship programming tool of Microsoft, GitHub’s parent company.

While the AI-powered code generator is still a work in progress, it provides some interesting hints about the business of large language models and the future directions of the software industry.

Not the intended use for GPT-3

The official website of Copilot describes it as an “AI pair programmer” that suggests “whole lines or entire functions right inside your editor.” Sometimes, just providing a function signature or description is enough to generate an entire block of code.

Working behind Copilot is a deep learning model called Codex, which is basically a special version of GPT-3 finetuned for programming tasks. The tool’s working is very much like GPT-3: It takes a prompt as input and generates a sequence of bytes as output. Here, the prompt (or context) is the source code file you’re working on and the output is the code suggestion you receive.

What’s interesting in all of this is the unexpected turns AI product management can take. According to CNBC: “…back when OpenAI was first training [GPT-3], the start-up had no intention of teaching it how to help code, [OpenAI CTO Greg] Brockman said. It was meant more as a general purpose language model [emphasis mine] that could, for instance, generate articles, fix incorrect grammar and translate from one language into another.”

General-purpose language applications have proven to be very hard to nail. There are many intricacies involved when applying natural language processing to broad environments. Humans tend to use a lot of abstractions and shortcuts in day-to-day language. The meaning of words, phrases, and sentences can vary based on shared sensory experience, work environment, prior knowledge, etc. These nuances are hard to grasp with deep learning models that have been trained to grasp the statistical regularities of a very large dataset of anything and everything.

In contrast, language models perform well when they’re provided with the right context and their application is narrowed down to a single or a few related tasks. For example, deep learning–powered chatbots trained or finetuned on a large corpus of customer chats can be a decent complement to customer service agents, taking on the bulk of simple interactions with customers and leaving complicated requests to human operators. There are already plenty of special-purpose deep learning models for different language tasks.

Therefore, it’s not very surprising that the first applications for GPT-3 have been something other than general-purpose language tasks.

Using language models for coding

Shortly after GPT-3 was made available through a beta web application programming interface, many users posted examples of using the language model to generate source code. These experiments displayed an unexplored side of GPT-3 and a potential use case for the large language model.

And interestingly, the first two applications that Microsoft, the exclusive license holder of OpenAI’s language models, created on top of GPT-3 are related to computer programming. In May, Microsoft announced a GPT-3-powered tool that generates queries for its Power Apps. And now, it is testing the waters with Copilot.

Neural networks are very good at finding and suggesting patterns from large training datasets. In this light, it makes sense to use GPT-3 or a finetuned version of it to help programmers find solutions in the very large corpus of publicly available source code in GitHub.

According to Codepilot’s homepage, Codex has been trained on “a selection of English language and source code from publicly available sources, including code in public repositories on GitHub.”

If you provide it with the right context, it will be able to come up with a block of code that resembles what other programmers have written to solve a similar problem. And giving it more detailed comments and descriptions will improve your chances of getting a reasonable output from Codepilot.

“Understand” might be the wrong word here. Language models such as GPT-3 do not understand the purpose and structure of source code. They don’t understand the purpose of programs. They can’t come up with new ideas, break down a problem into smaller components, and design and build an application in the way that human software engineers do.

By human standards, programming is a relatively difficult task (well, it used to be when I was learning in the 90s). It requires careful thinking, logic, and architecture design to solve a specific problem. Each language has its own paradigms and programming patterns. Developers must learn to use different application programming interfaces and plug them together in an efficient way. In short, it’s a skill that is largely dependent on symbol manipulation, an area that is not the forte of deep learning algorithms.

Copilot’s creators acknowledge that their AI system is in no way a perfect programming companion (I don’t even think “pair programming,” is the right term for it). “GitHub Copilot doesn’t actually test the code it suggests, so the code may not even compile or run,” they warn.

GitHub also warns that Copilot may suggest “old or deprecated uses of libraries and languages,” which can cause security issues. This makes it extremely important for developers to review the AI-generated code thoroughly.

So, we’re not at a stage to expect AI systems to automate programming. But pairing them with humans who know what they’re doing can surely improve productivity, as Copilot’s creators suggest.

And since Copilot was released to the public, developers have posted all kinds of examples ranging from amusing to really useful.

“If you know a bit about what you’re asking Copilot to code for you, and you have enough experience to clean up the code and fix the errors that it introduces, it can be very useful and save you time,” Matt Shumer, co-founder and CEO of OthersideAI, told TechTalks.

But Shumer also warns about the threats of blindly trusting the code generated by Copilot.

“For example, it saved me time writing SQL code, but it put the database password directly in the code,” Shumer said. “If I wasn’t experienced, I might accept that and leave it in the code, which would create security issues. But because I knew how to modify the code, I was able to use what Copilot gave me as a starting point to work off of.”

The business model of Copilot

In my opinion, there’s another reason for which Microsoft started out with programming as the first application for GPT-3. There’s a huge opportunity to cut costs and make profits.

According to GitHub, “If the technical preview is successful, our plan is to build a commercial version of GitHub Copilot in the future.”

There’s still no information on how much the official Copilot will cost. But hourly wages for programming talent start at around $30 and can reach as high as $150. Even saving a few hours of programming time or giving a small boost to development speed would probably be enough to cover the costs of Copilot. Therefore, it would not be surprising if many developers and software development companies would sign up for Copilot once it is released as a commercial product.

“If it gives me back even 10 percent of my time, I’d say it’s worth the cost. Within reason, of course,” Shumer said.

Language models like GPT-3 require extensive resources to train and run. And they also need to be regularly updated and finetuned, which imposes more expenses on the company hosting the machine learning model. Therefore, high-cost domains such as software development would be a good place to start to reduce the time to recoup the investment made on the technology.

“The ability for [Copilot] to help me use libraries and frameworks I’ve never used before is extremely valuable,” Shumer said. “In one of my demos, for example, I asked it to generate a dashboard with Streamlit, and it did it perfectly in one try. I could then go and modify that dashboard, without needing to read through any documentation. That alone is valuable enough for me to pay for it.”

Automated coding can turn out to be a multi-billion-dollar industry. And Microsoft is positioning itself to take a leading role in this nascent sector, thanks to its market reach (through Visual Studio, Azure, and GitHub), deep pockets, and exclusive access to OpenAI’s technology and talent.

The future of automated coding

Developers must be careful not to mistake Copilot and other AI-powered code generators for a programming companion whose every suggestion you accept. As a programmer who has worked under tight deadlines on several occasions, I know that developers tend to cut corners when they’re running out of time (I’ve done it more than a few times). And if you have a tool that gives you a big chunk of working code in one fell swoop, you’re prone to just skim over it if you’re short on time.

On the other hand, adversaries might find ways to track vulnerable coding patterns in deep learning code generators and find new attack vectors against AI-generated software.

New coding tools create new habits (many of them negative and insecure). We must carefully explore this new space and beware the possible tradeoffs of having AI agents as our new coding partners.

Ben Dickson is a software engineer and the founder of TechTalks, a blog that explores the ways technology is solving and creating problems.

Not the intended use for GPT-3

Using language models for coding

The business model of Copilot

The future of automated coding

More