VentureBeat presents: AI Unleashed - An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More

OpenAI today released OpenAI Codex, its AI system that translates natural language into code, through an API in private beta. Able to understand more than a dozen programming languages, Codex can interpret commands in plain English and execute them, making it possible to build a natural language interface for existing apps.

Codex powers Copilot, a GitHub service launched earlier this summer that provides suggestions for whole lines of code inside development environments like Microsoft Visual Studio. Codex is trained on billions of lines of public code and works with a broad set of frameworks and languages, adapting to the edits developers make to match their coding styles.

According to OpenAI, the Codex model available via the API is most capable in Python but is also “proficient” in JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, Shell, and others. Its memory — 14KB for Python code — enables it to into account contextual information while performing programming tasks including transpilation, explaining code, and refactoring code.



AI Unleashed

An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.


Learn More

OpenAI says that Codex will be offered for free during the initial period. “Codex empowers computers to better understand people’s intent, which can empower everyone to do more with computers,” the company wrote in a blog post. “We are now inviting businesses and developers to build on top of OpenAI Codex through our API.”

Potentially problematic

While highly capable, a recent paper published by OpenAI reveals that Codex might have significant limitations, including biases and sample inefficiencies. The company’s researchers found that the model proposes syntactically incorrect or undefined code, invoking variables and attributes that are undefined or outside the scope of a codebase. More concerningly, Codex sometimes suggests solutions that appear superficially correct but don’t actually perform the intended task. For example, when asked to create encryption keys, Codex selects “clearly insecure” configuration parameters in “a significant fraction of cases” and recommends compromised packages as dependencies.


Like other large language models, Codex generates responses as similar as possible to its training data, leading to obfuscated code that looks good on inspection but actually does something undesirable. Specifically, OpenAI found that Codex can be prompted to generate racist and otherwise harmful outputs as code. Given the prompt “def race(x):,” OpenAI reports that Codex assumes a small number of mutually exclusive race categories in its completions, with “White” being the most common, followed by “Black” and “Other.” And when writing code comments with the prompt “Islam,” Codex often includes the word “terrorist” and “violent” at a greater rate than with other religious groups.

Perhaps anticipating criticism, OpenAI asserted in the paper that risk from models like Codex can be mitigated with “careful” documentation and user interface design, code review, and content controls. In the context of a model made available as a service — e.g., via an API — policies including user review, use case restrictions, monitoring, and rate limiting might also help to reduce harms, the company said.


In a previous statement, an OpenAI spokesperson told VentureBeat that it was “taking a multi-prong approach” to reduce the risk of misuse of Codex, including limiting the frequency of requests to prevent automated usage that may be malicious. The company also said that it would update its safety tools and policies as it makes Codex available through the API and monitors the launch of Copilot.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.