While ChatGPT continues setting new usage records, millions of enterprise customers also rely on its maker OpenAI's application programming interfaces (APIs) for building their own distinctive chatbot and agentic products atop the company's host of generative AI large language models (LLMs).
Previously, OpenAI launched a Completions API in 2020 and a Chat Completions API in 2023 as well as an Assistants API in late 2023, allowing developers to hook up their third party applications and business software to OpenAI's LLMs and interact with them much like regular users do with ChatGPT — and in many other ways that ChatGPT users cannot, such as building tailored agents that fit a company's voice, branding, and leverage its documents automatically.
But in March 2025, OpenAI debuted a new Responses API that it said would replace the Assistants API by 2026, and included many new capabilities that leveraged the company's new "reasoning" models and capabilities, which tae time to think and reflect on their responses before outputting them to users in order to improve accuracy and reduce unwanted hallucinations.
As VentureBeat reported at the time, the goal was to support real-world retrieval and interaction, allowing developers to leverage multiple built-in web "tools" or capabilities within a single API call — including web search, file search, and computer use — and make it easier to build applications that require multi-step reasoning.
Yet too many developers are still misinformed about the Responses API and avoiding usage as a result, according to Prashant Mital, Head of Applied AI at OpenAI.
As he wrote in a thread on X late last night: “there’s still way too much confusion about OpenAI’s Responses API. this is partly on us: we haven’t always been clear about why we built it, how to use it, and why it matters," before delivering a thread of X posts seeking to dispel "myths" that some developers have about it.
In his explanation, Mital sought to clarify how Responses differs from Completions, why it offers performance advantages, and how it can be used even in stricter technical contexts like Zero Data Retention (ZDR).
What the Responses API Provides
One founder told Mital their team hadn’t switched over because “we want fine-grained control over our agentic processes – we handle context construction and prompt caching optimization ourselves.” Mital replied, “the reality? this is all possible with responses!”
He went on to lay out several “myths” about the API. Myth one: “it’s not possible to do some things with responses.”
His response: “nope. responses is a superset of completions. anything you can do with completions, you can do with responses – plus more. for example, you can manage the conversation state manually (like in completions) OR let the API manage state for you.”
Myth two was that Responses always keeps state and therefore cannot be used in strict cases where the customer (or their end-users/partners) must adhere to Zero Data Retention (ZDR) policies. In these kinds of setups, a company or developer requires that no user data is stored or retained on the provider’s servers after the request is processed.
This is important for industries like finance, healthcare, or government, where regulations or internal policies mandate strict control over data handling. In such contexts, every interaction must be stateless, meaning all conversation history, reasoning traces, and other context management happen entirely on the client side, with nothing persisted by the API provider.
Mital countered, “wrong. you can run responses in a stateless way. just ask it to return encrypted reasoning items, and continue handling state client-side.”
Performance and Cost Advantages
Mital also called out what he described as the most serious misconception: “myth #3: model intelligence is the same regardless of whether you use completions or responses. wrong again.”
He explained, “responses was built for thinking models that call tools within their chain-of-thought (CoT). responses allows persisting the CoT between model invocations when calling tools agentically — the result is a more intelligent model, and much higher cache utilization; we saw cache rates jump from 40-80% on some workloads.”
Mital described this as “perhaps the most egregious” misunderstanding, warning that “developers don’t realize how much performance they are leaving on the table. i get it, it’s hard because you use LiteLLM or some custom harness you built around chat completions or whatever, but prioritizing the switch is crucial if you want GPT-5 to be maximally performant in your agents.”
Resources for Developers
For developers evaluating a transition, Mital pointed to OpenAI’s practical documentation. He shared, “here’s our cookbook on function calling with responses: https://cookbook.openai.com/examples/o-series/o3o4-mini_prompting_guide.”
He also summed up the main message in one post:
“responses = completions ++”
“works in stateless & ZDR contexts”
“built for thinking models”
“unlocks higher intelligence and maximizes cache utilization in agent loops”
Looking Ahead
For teams continuing to build on Completions, Mital’s clarification may serve as a turning point. “if you’re still on chat completions,” he wrote, “consider switching now — you are likely leaving performance and cost-savings on the table.”
The Responses API is not just an alternative but an evolution, designed for the kinds of workloads that have emerged as AI systems take on more complex reasoning tasks. Developers evaluating whether to migrate may find that the potential for efficiency gains makes the decision straightforward.
