OpenAI updated the default model for ChatGPT to its new GPT-5.5 Instant, along with a new memory capability that finally shows which context shaped responses — at least some of them.
This limitation signals that models are starting to create a second, incomplete memory observability layer that could conflict with existing audit systems and agent logs.
GPT-5.5 Instant replaces GPT-5.3 Instant as the default ChatGPT model and is a version of its new flagship GPT-5.5 LLM. It’s supposed to be more dependable, accurate and smarter than 5.3.
But it’s the introduction of memory sources, which will be enabled across all models in the platform, that could help enterprises in their projects.
“When a response is personalized, you can see what context was used, such as saved memories or past chats, and delete or correct it if something is outdated or no longer relevant,” OpenAI said in a blog post.
When a user asks ChatGPT something, users can tap the sources button (at the bottom of the response) to see which files or past chats the model tapped to find the answer. Users also have full control over the sources models can cite, and these sources will not be shared if the conversation is sent to others.
The company said memory sources should make it easier to personalize model responses. Still, OpenAI admitted that the models “may not show every factor that shaped an answer” and promised to make the capability more comprehensive over time.
What this means is that memory sources offer a semblance of observability in ChatGPT answers, but not full auditability yet.
Competing memory systems
Enterprises have a system in place to solve part of the memory and context problem with models and agents. Models are exposed to context through retrieval-augmented generation (RAG) pipelines; whatever the agent fetches from the vector databases is logged, and the agent's state is stored in a memory layer. All of this is tracked in application logs, usually in an orchestration or management layer with built-in observability. Ideally, this allows teams to trace failure back through the stack.
The current system is imperfect; sometimes, it's not easy to trace failure points, but it’s at least internally consistent. For enterprises using ChatGPT, whether the default GPT-5.5 Instant or their model of choice, that’s no longer the case.
The model surfaces its own version with memory sources that are wholly separate from existing retrieval logs — in short, a model-reported context. A problem arises if these cannot be reconciled reliably. And because memory sources only give users part of the picture — it’s unclear what ChatGPT’s limit on citing memory sources is — it becomes even harder to match what GPT-5.5 Instant said it tapped to what it actually did in the production environment.
This situation creates a new failure mode: A competing context log. If something seems wrong, it can create inconsistencies that enterprises have to deal with.
Malcolm Harkins, chief trust and security officer at HiddenLayer, told VentureBeat that memory sources "look like a pragmatic middle ground " in offering some transparency, but it's still not easy to see its value.
"For enterprises, it's directionally useful but insufficient on its own," Harkins said. "Real value will depend on how it integrates with security, governance, access controls and audit systems."
A more capable default model
However, GPT-5.5 Instant handles memory, and OpenAI calls it an improvement over GPT-5.3 Instant.
Internal evaluations showed GPT-5.5 Instant returned 52.5% fewer hallucinated claims than the previous default model, especially for high-stakes domains such as medicine, law, and finance. Inaccurate claims fell by 37.3% on challenging conversations. The company said the model improved on photo analysis and image uploads, answering STEM questions and knowing when to tap its own knowledge base or use web search.
Peter Gostev, AI capability at independent model evaluator Arena, explained to VentureBeat in an email that the key result to watch about GPT-5.5 Instant is how it performs on the overall text rankings, especially because its predecessor did not have a strong showing.
“Since GPT-4o, the strongest-performing OpenAI chat model on the Arena has been GPT-5.2-Chat, which still ranks 12th on the Overall Text Arena months after release," Gostev said. Notably, users preferred it even over the higher-reasoning GPT-5.2-High variant, which is currently ranked 52nd on the Arena. “By comparison, GPT-5.3-Chat, the previous default model in ChatGPT, was significantly less competitive, ranking 44th overall, 32 places below GPT-5.2-Chat.”
What enterprises need to do about memory sources
Organizations that rely on ChatGPT for some tasks will need to formalize how memory works for their stack. Memory sources are not limited to GPT-5.5 Instant; it is enabled for all models on the ChatGPT platform.
To address the problem of competing memory sources, enterprises have to audit their memory management. Model-reported context could overlap or contradict these logs, so it’s best to define a clear source of truth. In the event of a failure, administrators know which log to believe.
It would also be a good idea to decide whether or not to expose memory sources to users. ChatGPT only shows a select number of chats or files it used to complete a request. Some users may find more transparency trustworthy.
Ultimately, the number one thing for enterprises to remember about memory sources is that what the model reports as its context is not the full picture for auditing. It’s a form of observability, but it cannot withstand a full examination.
