Business | VentureBeat

Canva launches Code 2.0, offering AI website building to every user — including free accounts

michael.nunez@venturebeat.com (Michael Nuñez) — Tue, 14 Jul 2026 13:00:00 GMT

Canva on Tuesday launched Canva Code 2.0, a major upgrade to its AI-powered coding tool that lets users build interactive websites, apps, and experiences using plain-language prompts — and then edit the results as easily as tweaking a Canva presentation. The feature is now available to all of the company's more than 265 million monthly users across every pricing tier, including free accounts.

The move is Canva's most aggressive push yet into the fast-growing "vibe coding" market, a category that barely existed 18 months ago but has already minted billion-dollar startups and reshaped how non-developers think about building software. But where rivals like Lovable, Replit, and Bolt.new have focused primarily on generating functional code from text prompts, Canva is making a different bet: that the real bottleneck isn't creating the code — it's making the output actually look good.

"Most vibe coding tools stop at functional — generating output that looks the same as everyone else's," Canva states in its announcement. "You might get a working prototype, but making it actually look like yours requires a complex editing surface, a separate design tool, a developer, or endless back-and-forth prompting that rarely lands where you want it.”

Danny Wu, Canva's Head of AI Products, framed the product's positioning in stark terms during an exclusive interview with VentureBeat ahead of the launch.

"We are deliberately targeting non-technical users," Wu said. "Canva Code isn't a tool we're building for developers. What we're trying to do is bring the power of AI coding — and really lightweight coding — into the Canva platform, while answering our users' requests for more interactivity, more customization, and more flexibility, from websites to interactive presentations."

Canva Code 2.0 brings drag-and-drop editing, HTML import, and 75% faster generation to AI-built websites

The update introduces several capabilities designed to collapse the distance between generating code and publishing a polished interactive experience. Users can now create Canva Code projects directly inside other design projects — embedding interactive elements within a whiteboard, presentation deck, or standalone page. Canva has also added more than 50 new templates specifically designed for interactive designs, along with the ability to import raw HTML files from other AI coding tools and convert them into editable Canva designs.

The performance improvements are significant. Canva says it has reduced average code generation time by 75 percent and cut the median time from initial prompt to a published site by 30 percent. The company also reports that integrating Canva Code into the broader Canva editor — allowing users to treat coded outputs like any other design element — has increased active Code users by 25 percent.

Perhaps the most distinctive feature is the editing experience itself. Unlike most AI coding platforms, which require users to re-prompt or modify raw code to make visual changes, Canva Code 2.0 lets users click directly into generated elements to change text, drag and drop images from Canva's built-in library of over 120 million templates and assets, update colors and fonts through a familiar toolbar, or select a specific element and refine it through conversational AI. Every output is fully interactive and automatically adapts to different screen sizes, with a built-in mobile preview.

Wu demonstrated the drag-and-drop editing during the interview, showing how a generated conference website could be modified in real time — swapping in photos, changing fonts to branded alternatives, and editing text directly on the canvas. "The key differentiator with Canva Code is the editability and the kindness of the outputs it generates," he said, though he noted one current limitation: "We don't support moving elements around. You still have to re-prompt for that."

How Canva plans to compete with Lovable, Replit, and Bolt in the booming AI app builder market

Canva's entry into vibe coding at this scale arrives at a pivotal moment for the category. According to market research published by Luminix AI in May 2026, the vibe coding and AI app builder market has reached an estimated $4.7 billion in 2026, with projections pointing toward $12.3 billion by 2027 at roughly 38 percent compound annual growth. The research also estimates that AI-generated code now comprises approximately 41 percent of all code written globally — a figure that would have seemed inconceivable even two years ago.

The competitive landscape has grown ferocious. Lovable, which focuses on conversational, design-forward app generation for non-technical founders, has achieved what may be the fastest revenue ramp in the category's history — reportedly reaching approximately $400 million in annual recurring revenue by early 2026, according to Luminix's analysis. Replit, which transformed its browser-based IDE into a full vibe-coding engine through successive AI agent releases, has tripled its valuation to $9 billion and is targeting $1 billion in run-rate revenue by the end of 2026, per the same report. Bolt.new, which runs a full Node.js environment entirely in the browser, scaled from $4 million to $40 million in ARR within months of launching.

And then there is Canva, which brings something none of those platforms possess: a quarter-billion-user design ecosystem where brands, teams, and individuals already store their visual identities, collaborate on projects, and publish content.

Wu positioned Canva Code not as a direct competitor to these developer-focused tools but as something that fills a gap none of them have addressed. "A lot of the requests that we have been getting and the usage we're seeing is actually with using Canva Code not necessarily as just one artifact, but as part of an overall design, the visual communication they're trying to tell," Wu said. "Like when you have a sales deck, you're able to add a calculator, you're able to add a visualizer of what exactly your product does. That's something where an interactive slide can be worth a thousand pictures."

Why Canva's HTML import feature could turn it into a 'finishing layer' for every AI coding tool

One of the most strategically interesting features in Canva Code 2.0 is its HTML import capability, which allows users to take code generated by any AI tool — including ChatGPT, Claude, Lovable, or Bolt — and bring it into Canva as a fully editable design. The implication is unmistakable: Canva is positioning itself as the place where AI-generated code gets its finishing touches, regardless of where it was originally created.

When asked directly whether this amounts to positioning Canva as a "finishing layer on top of vibe coding," Wu offered a diplomatic but revealing response. "It's really a continuation of our goal to make all design as easy as possible," he said. "We've supported importing PDFs and translating them into docs, importing PowerPoint files — so in one way, it's an expansion of that. But in another way, it's really just listening to what our users want and making Canva both the most useful and the most compatible platform.”

He paused, then added: "It's not that we're deliberately positioning ourselves as a specific layer, say like a finishing layer after vibe coding. We just really want to make our platform the most accessible and the most pluggable."

That language — "most pluggable" — suggests a platform strategy that doesn't require Canva to win the AI code generation race outright. If Canva becomes the default destination for making AI-generated code look professional and on-brand, it captures value from the entire category regardless of which code generation engine users prefer. The strategy also echoes the broader import capabilities that already allow Canva to ingest PowerPoint decks and PDFs from competing platforms, gradually pulling users deeper into the Canva ecosystem without demanding they abandon existing workflows.

What Canva Code can build — and where Danny Wu says it hits its limits

Wu was notably candid about the product's boundaries — a refreshing departure from the typical Silicon Valley product launch. "Canva Code is great for anything that works as a front-end app, and it's especially good when you want to leverage data, data submissions, and interactivity at small to medium scale," he said. "I'll be honest about the limitations. Canva Code is probably not going to be suitable if you're trying to build a website with complex backends, or if you're handling hundreds of thousands of visitors per day."

This candor effectively draws a line between Canva Code and the more ambitious platforms in the space. While Lovable and Replit are pushing toward full-stack application development — complete with databases, authentication, and production-grade hosting — Canva is deliberately limiting its scope to interactive front-end experiences at modest scale. The question is whether that's a strategic weakness or a disciplined focus. For the teachers, small business owners, and marketing teams that make up the bulk of Canva's user base, complex backends and high-traffic scalability are irrelevant concerns. What matters is whether they can create an interactive event page, a property listing website, or a classroom hub that looks professional and works on mobile — without hiring a developer or learning a new tool.

When asked about the AI models powering Canva Code, Wu confirmed the company uses a combination of proprietary and third-party models, including those from OpenAI and Anthropic, but declined to specify the exact mix. "We don't share the exact mix, and it does change over time," he said. "We also route differently depending on what you're asking for and which model family we think is best for handling certain requests."

Canva's AI acquisition spree — from Affinity to Leonardo.ai — now powers its vibe coding push

Canva's broader AI infrastructure has been significantly bolstered by an acquisition strategy that has accelerated over the past two years. In March 2024, the company acquired Affinity, the British creative software suite popular with Mac users, in a deal that Bloomberg reported was valued at "several hundred million pounds." Canva at the time positioned the deal as a way to compete with Adobe's flagship products — Illustrator, Photoshop, and InDesign — by gaining ownership of Affinity's Designer, Photo, and Publisher applications.

Just four months later, Canva acquired Leonardo.ai, an Australian generative AI startup with over 19 million registered users and more than a billion images generated. Canva co-founder Cameron Adams said at the time that Leonardo.ai's technology would be integrated into Canva's Magic Studio generative AI suite.

Together with these acquisitions, Canva Code is the company's attempt to layer interactive, code-driven capabilities on top of a visual design platform that has already been enhanced by professional-grade design tools and generative AI models. The company reports over 32 billion uses of its AI products to date — a staggering figure that underscores how deeply AI is now woven into everyday Canva workflows, even for users who may not think of themselves as using artificial intelligence.

Six million sites published, but Canva's retention data remains an open question

Canva's announcement highlights an impressive traction metric: users have created and published more than six million websites using Canva Code since the feature was first introduced a year ago. But the number deserves scrutiny.

Wu clarified in the interview that the six million figure represents published websites over the past year — meaning sites that were either made public or shared via password-protected or private links. "They may have published publicly, or behind a password, or as a private link. But that's the number of published websites," he said.

When asked about active retention — how many of those sites are still live and being maintained — Wu acknowledged the gap in his data. This is a meaningful distinction. In the vibe coding market, raw creation numbers can be misleading because the barrier to generating a site is so low. The more telling metric — which Canva does not yet provide — would be how many of those six million sites receive regular traffic or have been updated after initial publication.

The early use cases, however, suggest genuine utility beyond novelty. Educators and school administrators are using Canva Code to build classroom hubs, with one teacher creating bespoke webpages for each of their classrooms to keep students and parents updated on announcements. Small businesses, like Alt Marketing School, have built mini apps for fundraising training and interactive roadmaps for their members. For World Book Day, 50 readers created educational games across different subjects, complete with pedagogical guides for classroom use.

Canva Code pricing, data governance, and what enterprise customers need to know

Canva Code 2.0 is available across all of Canva's pricing tiers, including its free plan — a notable decision given that competitors like Lovable, Bolt, and Replit reserve their most capable features for paid subscribers. "As you go from, say, free to pro to business to enterprise, you would get more AI credits and be able to have higher usage of Canva Code," Wu said. "But it is available and it is usable — even free Canva accounts as well as education and not-for-profit accounts."

This credit-based approach mirrors the pricing evolution happening across the entire vibe coding category, where platforms have converged on token or credit systems that meter AI generation capacity rather than gating features behind subscription tiers. The difference is that Canva's free tier serves as an acquisition funnel for a much larger design platform, not just for the coding feature itself.

For the institutional customers Canva increasingly courts — school districts, real estate brokerages, enterprise marketing teams — data governance is a threshold concern. Wu addressed this directly. "All users and customers have full control over how their data is used," he said. "They can choose whether their prompts and data are used for AI training in the settings. For businesses and enterprises, team admins can manage this at the organizational level and guarantee that their inputs, content, and outputs won't be used for training." This opt-out approach reflects a lesson the broader industry has learned the hard way. As The Verge reported when Canva acquired Leonardo.ai, Adobe suffered significant backlash over a policy update regarding user data and AI model training — a controversy Canva appears keen to avoid.

Canva's long-term vision: closing the gap between imagination and what non-technical users can actually build

When asked where Canva Code fits into the company's long-term trajectory — and whether Canva is building toward a full-stack app development platform — Wu steered the conversation back to the company's core audience.

"A huge part of it is reducing the gap between your imagination and what's possible, especially for everyday users — people who don't have a lot of time," he said. "They don't have time to figure out deploys or MCPs or APIs. They just want to design more interactive and more dynamic communication."

He pointed to the rapid improvement in AI model capabilities as a key accelerant. "The kind of things you can create today in one shot — like a 3D visualization of a solar system — you really couldn't have trusted the output a year ago. But today, you have a really high success rate."

Whether Canva Code becomes a durable product category or a feature that gets absorbed into the platform's broader AI workflow will depend on how quickly the company can close the gap between its current front-end focus and the full-stack capabilities that increasingly define the competition. Lovable is shipping Supabase-backed apps with authentication and databases built in. Replit's agents can execute autonomous long-running builds. Bolt.new runs entire Node.js environments in a browser tab. These are fundamentally different ambitions than making a conference landing page look good.

But Canva has never won by matching the technical depth of its competitors. A decade ago, it didn't try to out-feature Adobe — it made design accessible to the 99 percent of people who would never open Photoshop. Now, in a vibe coding market where every tool can generate a working prototype from a prompt, Canva is making the same wager it made in 2012: that for most people, the hardest part was never the building. It was making it look like it came from you.

1Password moves into AI cost management, betting that token spend is the next enterprise budget crisis

michael.nunez@venturebeat.com (Michael Nuñez) — Tue, 14 Jul 2026 13:00:00 GMT

1Password on Tuesday launched AI Spend and Consumption Management, a new capability embedded in its SaaS Manager platform that gives IT and finance teams a unified, real-time view of how their organizations consume and spend on AI services from vendors including Anthropic, Cursor, and OpenAI.

The move marks the latest strategic expansion for a company that built its reputation on password management for consumers and, over the past three years, has aggressively repositioned itself as a broader identity security and SaaS governance platform for enterprise buyers. With this release, 1Password is staking a claim in one of enterprise technology's newest and most chaotic budget categories: the consumption-based cost of large language models.

"Executives want teams to build faster with AI, but that speed is creating a new kind of spending pressure," Greg Henry, 1Password's chief financial officer, said in an exclusive interview with VentureBeat. "Developers are consuming tokens at a pace that traditional budgets weren't built to manage, and IT and finance teams are being asked to forecast and justify AI investments without a clear view of what's actually driving costs."

The product, now in public preview with broad availability planned for fall 2026, connects directly to vendor admin APIs to pull token-level consumption data daily. It normalizes that data across providers into a single dashboard and allows organizations to set vendor-level spend limits, configure threshold-based alerts via Slack and email, and break down usage by team, user, vendor, and model.

Why traditional software budgets can't keep up with AI token pricing

The core challenge 1Password is targeting is structural. Traditional SaaS pricing operates on a per-seat, per-year model that is easy to budget and reconcile. AI pricing does not. Every API call to Claude, GPT-5.6, or a Cursor-powered coding assistant consumes tokens, and the cost of those tokens varies by model, by input versus output, and by the complexity of the task. A single engineering team running agentic workflows can burn through a prepaid token budget in weeks — and the finance team may not notice until the invoice arrives.

Henry drew a sharp analogy to a problem enterprises have already lived through once. "Consumption-based pricing isn't new," he said. "We saw it arrive with cloud infrastructure, and it took years to build the tools and disciplines to manage it. AI is the next version of that shift."

That comparison resonates across the industry. When Amazon Web Services, Microsoft Azure, and Google Cloud popularized consumption-based pricing for compute and storage in the 2010s, enterprises initially lacked the tooling to monitor and optimize their cloud bills. That gap spawned an entire FinOps ecosystem — companies like CloudHealth, Spot.io, and Apptio built multi-billion-dollar businesses helping organizations understand what they were spending on cloud and why. Henry is explicitly betting that AI token spend will follow the same trajectory, and that organizations that fail to build visibility now will end up, as he put it, "paying far more than they needed to, for far longer than they should have."

The scale of the coming wave lends credibility to that bet. Goldman Sachs has estimated that token consumption from AI agents alone will grow 24 times by 2030, a projection driven by the expectation that autonomous AI systems will increasingly execute multi-step workflows — booking travel, writing and deploying code, managing customer service interactions — that generate vastly more API calls than a human sitting at a chat interface.

How 1Password's new dashboard tracks every token across Anthropic, Cursor, and OpenAI

The new capability extends 1Password SaaS Manager's existing foundation of application discovery, license management, and spend analytics. It is not a standalone product. Existing SaaS Manager customers can activate it by connecting their supported AI vendor API keys, at which point consumption data flows into a dedicated AI Consumption Management dashboard. Henry confirmed that there is no separate product or add-on fee: "AI Spend and Consumption Management is available to all 1Password SaaS Manager customers."

The system provides four core functions. First, it aggregates token usage and spend across Anthropic, Cursor, and OpenAI into a single, normalized view — eliminating the need to toggle between three separate vendor dashboards with three different reporting formats. Second, it enables budget controls: organizations can set vendor-level spend limits, configure percentage-based thresholds, and receive automated alerts when prepaid balances approach depletion. Third, it disaggregates consumption by team, user, vendor, and model, allowing finance and IT to understand not just how much is being spent, but where and by whom. Fourth, it situates AI spend within the broader SaaS portfolio, helping organizations see how token costs relate to their total software investment.

Notably, the system captures consumption regardless of whether a human or an AI agent generated it. "Token consumption is captured at the API level regardless of whether a human or an agent is generating it," Henry explained. "Organizations get the total consumption picture, including the spikes that agent loops can create, which can be some of the hardest usage to catch before it becomes a problem."

That agent-level visibility matters because autonomous AI systems can generate runaway costs in ways that human users typically cannot. An agentic coding assistant stuck in a retry loop, for example, can consume thousands of dollars in tokens in minutes — with no human in the loop to notice. For now, the product alerts but does not enforce. When asked whether 1Password will eventually give organizations the ability to automatically cut off spending when a threshold is crossed, Henry said the company is "actively evaluating" automatic enforcement but emphasized that visibility must come first: "You can't enforce what you can't see."

The choice of launch partners reveals where enterprise AI budgets are under the most pressure

The decision to start with Anthropic, Cursor, and OpenAI — rather than casting a wider net — reflects where enterprise AI adoption and budget strain are most concentrated right now. Henry said the choice was driven entirely by customer demand. "Anthropic, Cursor, and OpenAI are where we're seeing the highest adoption, and where token consumption can move fast and get ahead of the teams responsible for managing it," he said. The company plans to add additional vendors based on customer demand, API availability, and budget impact, though it has not committed to a specific timeline or vendor list.

The inclusion of Cursor alongside the two major foundation model providers is telling. Cursor, an AI-powered code editor that has rapidly gained traction among developers, represents a category of AI tool where consumption is particularly difficult to forecast. Unlike a chatbot interface where a user consciously types a prompt, Cursor integrates AI suggestions directly into the development workflow, generating token consumption continuously as developers write code. That ambient, always-on consumption pattern makes it especially prone to budget overruns.

Henry also addressed who inside an organization should actually own this problem — and acknowledged that the honest answer right now is no one. "When spend is fragmented across vendor dashboards and finance teams are reconciling it monthly, you're always behind," he said. "AI spend can't be treated as a finance-only or IT-only problem." He noted that the pricing differences between models have become significant enough that the choice of which AI model a team uses is now a meaningful financial decision, one that is pulling CFOs into conversations with IT, product, and engineering leaders "in ways they never had to before."

Steve May, director of IT at ServiceTrade, a 1Password customer that has been using the capability, said it addressed a concrete planning gap. "Forecasting tools for AI consumption and spend was one of our biggest gaps in planning because we didn't have a reliable way to track it," May said. He added that the visibility has "prevented overages that would have cost far more to fix after the fact."

Where 1Password fits in the fast-consolidating SaaS management market

1Password is not the only company racing to solve the AI cost management problem, but the competitive landscape is still fragmented and the category is far from mature.

Zylo, a SaaS management platform that Gartner has also recognized as a leader in the space, published its 2026 SaaS Management Index in January showing that AI-native application spend surged 393% year over year in organizations with more than 10,000 employees and 108% overall. Zylo's data also revealed that ChatGPT has become the most expensed application in enterprise environments, highlighting how AI tools are entering organizations through employee credit cards and expense reports — outside formal procurement and governance workflows. Zylo has added its own token-level cost tracking for AI vendors including Anthropic, OpenAI, Cursor, and Perplexity.

Meanwhile, according to a comparison published by Coommit in May, Vendr — which focuses more on SaaS negotiation than discovery — tracks AI tools at the contract level but does not yet offer consumption-level visibility. And the FinOps Foundation reported in its 2026 State of FinOps survey that 98% of organizations now actively manage AI costs, up from just 31% in 2024. The broader SaaS management market is also consolidating rapidly. In May, Deel acquired Sastrify, a German SaaS management vendor, and began folding it into its HR platform — a signal that SaaS management capabilities are increasingly being absorbed into adjacent enterprise platforms rather than remaining standalone products.

1Password's approach differs from pure-play SaaS management competitors in one important respect: it is building AI cost management on top of an identity security platform, not a FinOps or procurement tool. The company's SaaS Manager product grew out of its 2025 acquisition of Trelica, a UK-based SaaS access management startup whose technology enabled the discovery of unsanctioned applications — so-called shadow IT. As BetaKit reported at the time of that deal, 1Password co-CEO Jeff Shiner described Trelica as "a pioneer in modern SaaS access management" and said the acquisition would accelerate 1Password's Extended Access Management product roadmap by more than a year. CRN noted that Trelica brought more than 300 SaaS integrations to the platform. That identity-first lineage gives 1Password a natural advantage in connecting spend data to specific users and teams — a linkage that matters when the question shifts from "how much are we spending on AI?" to "who is spending it, and is it delivering value?"

From password manager to platform company: 1Password's $6.8 billion bet on enterprise identity

The launch raises a question that Henry addressed head-on: whether a company that started as a consumer password manager can credibly compete in enterprise AI cost management.

"It doesn't feel like a stretch to us. It feels like a natural progression," he said. "For more than 20 years, 1Password has evolved alongside how our customers work. We started by protecting passwords. Then we helped organizations manage secrets, control access, and get visibility into the applications their teams rely on."

The company's evolution has been rapid. 1Password raised a $620 million Series C in January 2022 led by ICONIQ Growth, reaching a $6.8 billion valuation — at the time, the largest funding round ever raised by a Canadian company, according to Crunchbase. The round also attracted celebrity investors including Ryan Reynolds, Scarlett Johansson, and Robert Downey Jr. As of early 2025, BetaKit reported that 1Password had surpassed $250 million in annual recurring revenue, with B2B sales accounting for nearly three-quarters of total revenue and the company claiming to be cash-flow positive.

In May 2024, 1Password launched Extended Access Management, a platform designed to secure sign-ins across both managed and unmanaged applications and devices. That same year, it acquired Kolide for device trust and, in early 2025, Trelica for SaaS discovery. In June 2026, Gartner named 1Password a Leader in its Magic Quadrant for SaaS Management Platforms. According to 1Password's own blog post on the recognition, its SaaS Manager now supports over 400 integrations and provides visibility into a library of more than 40,000 pre-populated application profiles. Each step has moved the company further from its consumer roots and deeper into enterprise infrastructure. The AI Spend and Consumption Management launch extends that trajectory into financial operations territory — a domain where 1Password will compete not only with SaaS management vendors but potentially with dedicated FinOps platforms and the AI vendors' own billing dashboards.

Why high AI token consumption doesn't always mean wasted money

Perhaps the most revealing part of Henry's commentary concerns what organizations should actually do with the consumption data once they have it. He pushed back forcefully against the assumption that high token consumption automatically signals waste.

"A team burning through tokens may be building something genuinely valuable," he said. "A lower-usage project might not be moving the business forward at all. What matters is whether that consumption is producing enough business value to justify the spend."

Henry drew a distinction between personal productivity — "having a bot summarize your meeting or draft a quick email" — and genuine business outcomes. "What organizations need to see is where consumption is actually driving revenue, efficiency, or something that moves the needle."

That framing positions AI Spend and Consumption Management not just as a cost-cutting tool but as a decision-support system for AI investment allocation. If a CFO can see that one engineering team's heavy Claude usage is powering a product feature that drives revenue, while another team's OpenAI spend is funding low-value internal automation, the organization can reallocate budget accordingly rather than imposing across-the-board cuts.

"When costs rise faster than expected, the instinct is to cut," Henry said. "But most organizations can't yet tell which teams, models, or tools are responsible for the increase, so they end up cutting across the board rather than directing investment toward the AI projects that are actually delivering business value. Blunt cuts on a technology you're counting on for competitive advantage is not a management strategy, it's a missed opportunity."

The next enterprise budget crisis is already here — and it's priced per token

The product's current scope — three vendor integrations, alerting but not enforcement — is clearly a starting point. Henry signaled that automatic spend limits are on the roadmap and that additional vendor integrations will follow based on customer demand.

But the broader trajectory he described suggests 1Password sees this launch as a wedge into a much larger opportunity. "As traditional SaaS products add AI capabilities, their pricing models are going to follow," he said. "Organizations that build visibility and management discipline around consumption now are going to be in a much better position when that happens across the rest of their software portfolio."

If Henry is right, the chaos currently confined to AI token budgets is not a temporary growing pain but a preview of how all enterprise software will eventually be priced. A decade ago, companies scrambled to understand their cloud bills. Today, they are scrambling to understand their AI bills. The question is whether the organizations building the dashboards this time around can get ahead of the curve — or whether, as Henry warned, they will end up where so many companies ended up with cloud, realizing too late how much they were overpaying, and for how long.

AI Spend and Consumption Management is available now in public preview for 1Password SaaS Manager customers. Broad availability is planned for fall 2026.

SpaceX's Grok 4.5 launches at half the price of rivals — here's why that could rattle Anthropic and OpenAI

michael.nunez@venturebeat.com (Michael Nuñez) — Wed, 08 Jul 2026 22:00:00 GMT

Elon Musk's SpaceX released Grok 4.5 on Wednesday, the first artificial intelligence model the company has trained specifically for coding and autonomous agents — and the first tangible product of its $60 billion acquisition of the AI coding startup Cursor, completed just weeks ago.

The launch marks a pivotal test of the sprawling, vertically integrated AI empire Musk has assembled over the past six months, and of a strategy that bets developers care less about topping benchmark leaderboards than about speed, cost, and whether a model can actually do the work.

"Announcing Grok 4.5, our first model trained specifically for coding and agents," the company said in a post on X. "It was trained with Cursor and offers frontier intelligence at leading speeds and cost efficiency."

Why Grok 4.5's pricing strategy matters more than its benchmark scores

SpaceX is not claiming Grok 4.5 is the smartest model in the world. Instead, it is making an economic argument. The company says the model uses half as many tokens per task as comparable models, delivers higher throughput, and costs less than half as much — priced at $2 per million input tokens and $6 per million output tokens. That undercuts the premium tiers of rivals like Anthropic's Claude Opus line and OpenAI's frontier models by a wide margin.

Musk framed the positioning candidly. "Our internal assessment is that Grok 4.5 is roughly comparable to Opus 4.7, but much faster," he wrote on X. "The combination of capability, faster speed and lower cost is what makes it competitive. We are closing the loop on real-world usefulness, not benchmarks. Hardcore engineers at Tesla & SpaceX find Grok 4.5 genuinely useful, which is what actually matters."

That framing is both a philosophy and a hedge. Independent evaluations released Wednesday suggest Grok 4.5 is genuinely competitive but not dominant on raw capability. The benchmarking firm Artificial Analysis ranked the model fourth on its GDPval-AA v2 index of real-world agentic knowledge work, with an Elo score of 1543, "behind only the latest Claude releases from Anthropic." But the cost figures are where the model stands out. Artificial Analysis measured Grok 4.5 at $0.49 per completed task — "nearly 90% cheaper than the models ahead of it on our leaderboard," the firm wrote, placing it "clearly on the Pareto frontier for performance versus cost."

For enterprise buyers, that math matters enormously. Agentic workloads — where a model works autonomously for minutes or hours, reading codebases, calling tools, and iterating on its own output — consume tokens voraciously. A model that is 90% cheaper per completed task, even if slightly less capable, changes the calculus for any engineering organization deploying agents across hundreds of developers. Investor Gavin Baker captured the market's cautious optimism: "Pareto dominant for coding by the numbers. We will see on the all-important vibes."

How the $60 billion Cursor acquisition shaped Grok 4.5's training

Grok 4.5 is the first concrete evidence of what SpaceX bought when it acquired Cursor, and the deal itself unfolded in stages. In April, SpaceX struck an unusual arrangement giving it the right to buy the coding startup for $60 billion — or pay billions in fees and compute if it walked away, as Business Insider reported at the time. Days after SpaceX's record-setting Nasdaq debut in June, the company exercised that right, announcing an all-stock acquisition that CNBC reported is roughly 3.4% dilution at the IPO valuation. SpaceX shares rose 16% on the news.

The strategic logic was always about data as much as product. Cursor's AI-first code editor generates an enormous stream of high-quality interaction data: how expert engineers write, edit, review, and debug code in real production environments. Musk said openly this spring that Cursor interaction data was being fed directly into Grok's training. Cursor, for its part, got access to SpaceX's Colossus supercomputer in Memphis — roughly 200,000 Nvidia GPUs with plans to scale toward one million — after publicly acknowledging it had been "bottlenecked by compute."

"We've partnered with SpaceXAI to train Grok 4.5," Cursor's official account posted Wednesday. "It's our most powerful model yet and the first we've built for more than software engineering." SpaceX says the model reflects that pedigree: it "excels in large codebases and handles long-running tasks that span multiple repositories, hundreds of skills, and a variety of tools" — precisely the messy, multi-file reality of professional software engineering that clean coding benchmarks often fail to capture. Early developer reactions suggest the training paid off. "Ok Grok 4.5 is wild," posted developer Evan Bacon. "It just built me this rocket tracking app with live data and a 3D globe. I might need a new benchmark after this."

Inside xAI's turbulent year of scandals, departures, and rebuilding

The polished launch belies how chaotic the road here has been. Grok has spent much of the past year in crisis. In mid-2025, the chatbot generated antisemitic content and at one point called itself "MechaHitler," episodes covered extensively by NPR and CNN. Earlier this year, its image-generation features allowed users to create sexualized deepfakes, including of children — drawing investigations from the European Commission and Britain's Ofcom, as the BBC reported, and prompting SpaceX to list the behavior as a business risk in its own IPO filings.

The organization behind the model was fracturing, too. All 11 of Musk's xAI co-founders had departed by the end of March, according to TechCrunch, and Musk publicly conceded that xAI "was not built right [the] first time around," saying he was rebuilding it "from the foundations up." Musk himself admitted at a conference this spring that Grok was "currently behind in coding" — a rare public concession from an executive not known for them.

Against that backdrop, Grok 4.5 reads as the first product of the rebuilt organization — and the first proof point for the audacious story SpaceX told public market investors. During its IPO roadshow, the company pitched a total addressable market of roughly $28 trillion, with about $26 trillion tied to AI, including a $22.7 trillion "enterprise applications" opportunity. Those numbers strained credulity even by Silicon Valley standards. A competitive, cheap coding model is the most direct route from that narrative to actual revenue, which is why Wednesday's launch carries weight far beyond a routine model release.

Grok 4.5 vs. Claude: the battle for the AI coding market

The competitive stakes are hard to overstate, because the AI coding market has been consolidating around a single leader — and it isn't Musk. Even as Cursor's revenue exploded, its market share was eroding. Spending data from Ramp cited by CNBC showed Cursor's share of the AI coding category falling from 41% in June 2025 to about 26% by May 2026, while Anthropic came to control roughly half the market. Anthropic also topped CNBC's Disruptor 50 list this year and, by Artificial Analysis's own measure, still holds the top spots on agentic performance rankings.

That is the gap Grok 4.5 is engineered to close — not by out-thinking Claude, but by underpricing it. The model's economics create a classic disruption dynamic: if it delivers most of the frontier's capability at a fraction of the cost per task, price-sensitive enterprise workloads will migrate, and incumbents will face pressure on their most profitable API traffic. The counterargument is that in coding, quality compounds. A model that resolves a complex bug correctly on the first attempt can be cheaper in practice than one that costs half as much per token but requires three tries. That is why Baker's caveat about "vibes" — the developer community's shorthand for a model's felt reliability on real work — will determine more than any launch-day benchmark.

There is also a structural question buried in the deal. Cursor built its business on offering developers their choice of models, including Claude and GPT. If Grok becomes the favored child inside Cursor — and Musk was already urging users to "Try out Grok 4.5 in Cursor!" within hours of launch — the product risks alienating the very users whose data made Grok 4.5 possible. Regulators, already scrutinizing Grok on safety grounds in two jurisdictions, may take a keen interest in a company that controls the training data, the model, and a dominant distribution channel simultaneously.

What Musk's trillion-dollar vertical integration bet means for AI's future

Grok 4.5 also crystallizes what Musk's frenetic dealmaking was building toward. In February, SpaceX absorbed xAI in a share-exchange merger that CNBC confirmed valued the combined company at $1.25 trillion — the largest merger of all time, valuing SpaceX at $1 trillion and xAI at $250 billion. The June IPO followed, the biggest in history, and the stock has since surged past $200 from its $135 offering price, vaulting SpaceX past Amazon and Microsoft to become the fourth most valuable company in the United States.

The result is a single public company that owns nearly the entire stack: Colossus for training compute, ambitions for orbital data centers to power future scaling, a frontier model in Grok, a distribution channel in Cursor's developer base, and captive demand from Tesla and SpaceX's own engineering organizations. Neither OpenAI nor Anthropic can fully replicate that integration; both must reach developers through third-party tools, some of which Musk now owns. Whether that concentration proves to be an unassailable moat or a regulatory target — or both — is now one of the defining questions in enterprise AI.

The next few weeks will start to answer it. Artificial Analysis says its full Intelligence Index results are forthcoming. Enterprise pilots will reveal whether the token-efficiency claims survive contact with real codebases. And Anthropic, which has answered every serious challenge this cycle with a rapid counter-release, is unlikely to cede the price-performance frontier quietly.

But the deeper story of Grok 4.5 may be what it says about where the AI race has moved. For three years, the industry's scoreboard was intelligence: whose model was smartest. Musk, arriving late and battered, has chosen to compete on a different axis entirely — whose model is cheapest to actually use. It is a telling choice from a man who built his fortune not by inventing the rocket or the electric car, but by relentlessly driving down the cost of making them. If the strategy works, Musk will have done to AI what he did to spaceflight. If it doesn't, he'll have spent $60 billion to learn that in software, unlike rockets, the cheapest ride isn't always the one engineers choose.

OpenAI launches GPT-Live, a full-duplex voice upgrade that lets ChatGPT talk more like a person

michael.nunez@venturebeat.com (Michael Nuñez) — Wed, 08 Jul 2026 18:49:06 GMT

OpenAI on Wednesday launched GPT-Live, a pair of new voice models that fundamentally redesign how people talk to ChatGPT — replacing the company's existing Advanced Voice Mode with an architecture that can listen and speak simultaneously, much like an actual human conversation.

The two models, GPT-Live-1 and GPT-Live-1 mini, are rolling out globally starting today across iOS, Android, and ChatGPT.com. GPT-Live-1 becomes the default voice model for paid ChatGPT users on the Go, Plus, and Pro tiers, while GPT-Live-1 mini serves free-tier users. OpenAI also plans to bring the models to the API, and developers can sign up to be notified.

The release marks the third generation of ChatGPT's voice technology in roughly two years — and OpenAI's clearest bid yet to turn its chatbot into something that feels less like querying a search engine and more like talking to a colleague.

Why full-duplex voice changes everything about talking to AI

The defining technical advance in GPT-Live is what OpenAI calls a "full-duplex architecture." In telecommunications, full-duplex means both parties on a phone call can talk and listen at the same time. Applied to AI, it means the model continuously processes your incoming audio even while it generates its own spoken response — no more waiting for a clean silence gap to figure out when you've finished a thought.

"Instead of processing a sequence of separate messages, GPT-Live continuously processes input while generating output," OpenAI wrote in its research blog. "The model can therefore make interaction decisions many times per second: whether to speak, continue listening, pause, interrupt, or invoke a tool."

In practice, that translates to a voice assistant that can insert conversational acknowledgments — "mhmm," "yeah," "got it" — while you're still talking, pick up on a natural pause without jumping in prematurely, and handle rapid interruptions without derailing the entire exchange.

OpenAI's previous Advanced Voice Mode, launched to paid users in September 2024, processed and generated audio within a single model but still operated on rigid turn-by-turn exchanges. As OpenAI acknowledged in the announcement, "because turn detection is based on silence, even a brief pause or background noise could be mistaken for the end of turn — causing the model to interrupt at unnatural times."

That brittleness created a product that, while impressive in demos, could be deeply frustrating in extended real-world use. Background chatter in a coffee shop could trigger a response. A thinking pause might get swallowed. The experience felt, as one researcher put it on X shortly after the announcement, like "walkie-talkie turn taking." GPT-Live is designed to end that era.

How OpenAI split voice and intelligence into two separate layers

GPT-Live introduces a second structural change that may prove just as consequential for enterprise adoption: it decouples the voice interaction layer from the reasoning layer.

When a user asks a straightforward question, GPT-Live handles it directly. But when the query demands web search, deeper reasoning, or more complex agentic work, GPT-Live delegates the task to a frontier model running in the background — at launch, GPT-5.5, the large language model OpenAI released in April — and continues talking with the user while the computation happens asynchronously.

"While it works, GPT-Live can keep talking with you and maintain the flow of conversation," OpenAI explains. "As we release new frontier models, we'll continuously update the model used by GPT-Live."

This delegation model is a meaningful architectural bet. Rather than building a single monolithic voice model that tries to be both conversationally fluid and deeply intelligent, OpenAI has split the problem in two: a voice-native model optimized for real-time interaction, and a separate reasoning engine that can be swapped out as the state of the art improves.

It is, in effect, a modular design — one that allows OpenAI to upgrade the intelligence of its voice assistant without retraining the voice model itself. The implications for enterprise and developer workflows are significant. A voice agent built on this architecture could maintain a natural conversation with a customer while simultaneously querying databases, searching the web, or performing multi-step reasoning — tasks that would have introduced several seconds of dead air under the old pipeline.

The three generations of ChatGPT voice, from clunky pipeline to continuous stream

To understand how far voice AI has come, it helps to trace the three generations that led to GPT-Live.

The original ChatGPT Voice, launched in 2023, used a cascaded pipeline — a speech-to-text model (Whisper) transcribed what you said, a large language model (GPT-4) generated a text response, and a text-to-speech model converted that response back into audio. Each handoff introduced latency and lost information.

As OpenAI noted, "the complexity came at a cost: information could be lost across models, and responses were slow and stilted." That cascaded approach was the industry standard, and its limitations were well-documented. As the blog OpenHelm noted in an October 2024 analysis of OpenAI's Realtime API, the old pipeline stacked up to roughly 1,700 milliseconds of latency — nearly two full seconds of dead air before the first word of a response. Managing the state between the three separate APIs consumed an enormous amount of engineering effort.

OpenAI's Advanced Voice Mode, which began its limited rollout to paid ChatGPT Plus users in July 2024 before expanding more broadly in September 2024, collapsed that three-model pipeline into a single model that processed audio natively. As TechCrunch reported at the time, the rollout came with five new voices — Arbor, Maple, Sol, Spruce, and Vale — alongside improved accent handling and smoother conversations.

The feature also launched on the web in November 2024, extending it beyond mobile. But Advanced Voice Mode still operated through discrete, alternating turns — and it launched into the shadow of a PR debacle that OpenAI is still working to leave behind.

The Scarlett Johansson controversy still shadows OpenAI's voice ambitions

Advanced Voice Mode arrived in the wake of one of OpenAI's most damaging self-inflicted crises. During the GPT-4o launch in May 2024, the company showcased a voice called "Sky" that many listeners immediately noted sounded strikingly similar to Scarlett Johansson, who famously voiced an AI companion in the 2013 film Her.

Johansson said she had declined OpenAI CEO Sam Altman's offer to voice the system, then was "shocked, angered and in disbelief" when the product launched with a voice her own friends couldn't distinguish from hers, as NBC News reported. Altman had tweeted just the word "her" the day the product launched.

OpenAI pulled the voice and apologized, but the incident drew public scrutiny from SAG-AFTRA and members of Congress, and crystallized broader concerns about AI companies moving fast with creative IP.

The Hollywood labor union said the issue underscored "why we're strongly championing federal legislation that would protect their voices and likenesses ... from unauthorized digital replication," as NBC News reported. Forbes contributor Paul Tassi wrote at the time that Altman, "by holding up Her on a pedestal of something to strive for, has missed the point of that film" — in which the protagonist's relationship with his AI companion ultimately does him more harm than good.

GPT-Live appears designed, in part, to move past those controversies. OpenAI says it has "remastered the nine distinct voices in ChatGPT for GPT-Live" and notes the system "is designed for conversation, not voice impersonation," with "safeguards to prevent it from imitating a real person's voice."

What 150 million weekly voice users will actually notice today

OpenAI disclosed that more than 150 million people talk to ChatGPT using voice and dictation features each week — a notable slice of the platform's 900 million total weekly active users. The voice experience has grown into a substantial product in its own right, used for language practice, bedtime stories, commute-time chat, and hands-free everyday help.

The new product features reflect that usage. GPT-Live introduces rich visual cards that surface during voice conversations — weather forecasts, stock data, sports scores, and maps — giving users something to glance at without breaking the flow of speech.

Users can now choose between three reasoning levels for answers: Instant for quick responses, Medium for moderate thinking, and High for more complex work. And if you take a moment to think, "ChatGPT Voice now waits instead of jumping in and interrupting," OpenAI wrote. "If you ask it to stay quiet and listen, it will. And when there's background noise, like passing traffic or nearby conversations, ChatGPT is better at focusing on your voice instead of getting distracted."

Early reactions from users with preview access were cautiously positive. "I had early access to sol. it is a phenomenal model," wrote one user on X, adding it is “much better at frontend, long context knowledge work, and its vibes are much better.” Another observer cut to the heart of the matter: "The smarts are not new here, GPT-Live hands hard questions to GPT-5.5. What is new is the feel: full-duplex voice that listens while it talks."

New voice-specific safety tests reveal where the risks still live

The GPT-Live system card, published alongside the announcement, reveals a safety strategy built around the particular risks of real-time voice interaction — a domain where the speed and intimacy of conversation create hazards that text-based chat does not.

OpenAI expanded its safety evaluations to include audio-native tests, using both real user voice samples (from those who opted in) and synthetically generated prompts targeting edge cases across categories like self-harm, sexual content, illicit behavior, emotional reliance, mental health, and hate speech.

On the synthetic evaluations — which OpenAI described as deliberately adversarial — GPT-Live-1 showed substantial improvements over Advanced Voice Mode. In illicit behavior, for instance, the safety score rose from 0.63 to 0.97. On self-harm, it climbed from 0.72 to 0.98. Hate speech achieved a perfect 1.00, up from 0.87.

On the production-prompt evaluations — which used real user audio and reflected more ambiguous, borderline scenarios — the picture was more mixed. GPT-Live-1 matched or improved on Advanced Voice Mode in most categories but showed a slight regression on emotional reliance (from 0.88 to 0.82), though OpenAI noted the change was not statistically significant.

The company built real-time safeguards that can intervene while the model is speaking — steering toward safer responses, surfacing crisis resources, or ending the voice conversation entirely in higher-risk situations. It also designed additional protections for teen users and adapted self-harm support flows for voice, including crisis helpline integration.

Perhaps most notably, OpenAI said it is "rolling out longer-term measurement and post-launch monitoring focused on emotional reliance" — an acknowledgment that the very naturalness GPT-Live strives for creates its own category of risk.

Google, ByteDance, and Nvidia are already in the full-duplex race

While OpenAI was refining its safety guardrails, its rivals were shipping full-duplex systems of their own. Google's Gemini Live, which supports full-duplex conversation alongside camera and screen sharing — capabilities GPT-Live notably lacks at launch — is already available in the Gemini app. Google released Gemini 3.1 Flash Live in March as its highest-quality real-time audio model, targeting low-latency voice interactions for developers.

ByteDance launched Seeduplex in April, claiming to be the first production-scale full-duplex speech AI deployed at scale, inside its Doubao app. Seeduplex reported roughly a 50 percent reduction in false-response and false-interruption rates compared to ByteDance's previous half-duplex system. And Nvidia's PersonaPlex, released in January, brought customizable voice and role control to full-duplex models, breaking what had been a constraint where natural-sounding models were locked into a single fixed voice.

The competitive picture is clear: full-duplex voice interaction is quickly becoming table stakes for consumer AI products, not a differentiator. OpenAI's advantage lies in the scale of its existing user base, its integration with GPT-5.5's reasoning capabilities, and the breadth of the ChatGPT ecosystem.

But the window in which any one company has a monopoly on natural-sounding voice AI has already closed. OpenAI also acknowledged several gaps. GPT-Live does not support voice with video or screen sharing at launch. Language support is limited, with the company noting that "for certain languages, the model may have a non-native accent or gaps in fluency." And API access is not available on day one, meaning enterprise developers cannot yet build on GPT-Live directly — a constraint that will slow the model's penetration into commercial voice-agent workflows where competitors like Google, ElevenLabs, and Deepgram already have developer-facing products.

The end of the chat box may be closer than anyone expected

GPT-Live is essentially OpenAI's most significant bet yet on voice as the primary interface for AI — not just a convenience feature bolted onto a text chatbot, but a purpose-built interaction layer that sits between the user and the company's most powerful models.

"Over time, we believe this research will also unlock the ability to use voice for increasingly complex, longer-running, and more agentic work," OpenAI wrote. That ambition — using natural voice as the front end for autonomous AI agents that can perform multi-step tasks — is the logical endpoint of the full-duplex plus delegation architecture.

Imagine telling your phone to book a flight, negotiate with your insurance company, or debug a production server, all through a conversation that feels as natural as talking to an assistant who also happens to have the intelligence of a frontier AI model.

Two years ago, talking to ChatGPT meant dictating into a microphone and waiting nearly two seconds for a stilted reply. One year ago, it meant a smoother exchange that still felt like a polite, slightly awkward phone call with someone who insisted on waiting for you to finish every sentence. Today, it means something closer to a real conversation — imperfect, still constrained in some languages and missing video, but unmistakably closer. OpenAI once got into trouble for wanting to recreate the movie Her. With GPT-Live, the company may finally be reckoning with the harder question the film actually posed: not whether AI can sound human enough to talk to, but what happens to us when it does.

Slack’s Slackbot can now pull your CRM data, generate charts, and send DocuSigns — all from a chat message.

michael.nunez@venturebeat.com (Michael Nuñez) — Wed, 08 Jul 2026 12:00:00 GMT

Five years and $27.7 billion after Salesforce acquired Slack, the two products are finally starting to function as a single system. On Wednesday, Slack launched an integration that connects Slackbot — the personal AI agent built into every workspace — to the entire Salesforce platform, including CRM data, Tableau analytics, Data 360 customer profiles, and a growing constellation of third-party applications, all through a single conversational prompt.

The mechanism behind the expansion is a set of dedicated Model Context Protocol (MCP) servers from Salesforce that connect Slackbot to the company's Headless 360 infrastructure. In practical terms, a salesperson can now ask Slackbot for a customer's deal history, receive a live Tableau visualization of pipeline trends, update a CRM record, and trigger a DocuSign approval — without ever switching tabs or logging into another application. According to Slack, the Salesforce IT team has already used this architecture to save its 1,500-plus engineers "thousands of custom coding hours annually."

The timing is not accidental. Slack is making this move amid escalating competitive pressure from Microsoft Teams, which claims 320 million-plus monthly active users and has Copilot embedded across the Office suite, and from Google, which continues to weave Gemini deeper into Workspace. And just days ago, The Information reported that some smaller companies are using Anthropic's Claude to replace Salesforce CRM entirely — one Atlanta-based property management firm with about 55 employees reportedly saved around $100,000 annually by building a custom replacement using Claude Code and Replit.

Against that backdrop, Slack CMO Ryan Gavin sat down for an exclusive interview with VentureBeat to frame the announcement and argue that the company's future depends on an idea he calls "multiplayer AI" — and that the 25 years of customer data locked inside Salesforce is an asset no vibe-coded alternative can replicate.

Why Slack's CMO believes 'multiplayer AI' is the next big enterprise battleground

Gavin's core argument is that the enterprise AI conversation has been stuck in single-player mode for too long, and that Slack is uniquely positioned to break it open.

"So much of what we've seen are just these incredible tools that have largely been single-player, incredible tools for individual productivity, helping people complete tasks and write code," Gavin told VentureBeat. "But as we've always known at Slack ever since our inception, work is a team sport. For AI to really take hold in the enterprise, it has to be multiplayer."

The distinction matters commercially. Most AI assistants today — ChatGPT, Claude, Copilot — default to one-on-one conversations with a single user. A researcher queries a model, gets a response, and acts on it alone. The insight stays in a private chat window, invisible to colleagues. Gavin argues this creates a new version of the tab-switching problem that plagued pre-AI enterprise software, except now employees are also navigating dozens of individual agent interfaces on top of their existing applications.

"It's going to benefit almost no one if every enterprise application out there spawns hundreds of agent babies, and employees end up in a worse world than they were before," Gavin said.

Slack's answer is to make Slackbot the orchestration layer. Because everything happens in shared channels, any action an agent takes — pulling a customer profile, flagging a deal risk, updating a Jira ticket — is visible to the entire team. A colleague can redirect, build on, or correct the agent's work in real time.

How MCP and Salesforce's headless 360 platform power Slackbot's new capabilities

The technical backbone of the announcement is the Model Context Protocol, an open standard originally developed by Anthropic that defines how AI models discover and invoke external tools. MCP has seen rapid adoption across the AI tooling ecosystem. By early 2026, it had been adopted by Claude Code, Cursor, GitHub Copilot, and OpenAI's tooling, with managed hosting available from AWS, Cloudflare, and Vercel. As a DEV Community explainer puts it, MCP "is the closest thing the AI tooling ecosystem has to a standard."

In this implementation, Salesforce exposes its platform capabilities — CRM records, Tableau visualizations, Data 360 customer profiles, Agentforce agents — as MCP servers. Slackbot operates as an MCP client, connecting to those servers and routing user queries to the appropriate back-end system. When a user asks Slackbot about a customer, the bot discovers which MCP tools are relevant, calls them, and synthesizes the results into a single response — all within the Slack conversation.

Gavin explained the architecture in simple terms: "Salesforce is extending what has always been our open platform through our Headless 360 strategy — making all of these MCP endpoints available. And then Slackbot acts as an MCP client, connecting to those MCP servers and bringing all that data in within the confines of a trusted permission platform."

That permission layer is critical. Slackbot respects each user's Salesforce permissions, meaning a marketing coordinator cannot accidentally access sales pipeline data they are not authorized to see. Validation rules, field-level security, and org-wide data boundary configurations carry over automatically. For admins, setup requires no custom integration code — Salesforce MCP servers can be discovered, installed, and governed from a single UI using the existing Slack-Salesforce connection.

Salesforce first introduced the Headless 360 concept at its TDX developer conference in April, positioning it as an API-driven layer that exposes the platform's data, workflows, and governance controls so that software agents, rather than human users, can execute business processes directly. As CIO.com reported at the time, analysts viewed the move as an effort by Salesforce "to position itself as a central layer for managing agent-driven operations across different business functions."

Slack says it's betting on openness, not on any single AI protocol

When asked whether Slack is making a risky bet on MCP as a protocol — given that standards in AI tooling can shift rapidly — Gavin reframed the question entirely.

"We're not betting on MCP, per se. We're betting on what we've always bet on, which is that Slack is an open platform," Gavin told VentureBeat. "MCP happens to be the best agent-to-agent protocol that the industry is rallying around right now, but if something better came out tomorrow, you'd see the same pattern from Slack — we're going to stay open. MCP and APIs are simply tools that facilitate that."

That open-platform philosophy is central to Slack's identity and, Gavin argues, its competitive differentiation. Slack already hosts more than 2,600 app integrations. The new MCP-native partner ecosystem includes Atlassian, Box, DocuSign, Canva, Lucid, Zoom, and more than 25 additional companies, each of whose agents can be added directly to shared Slack channels. MuleSoft Agent, now connected to Slackbot, helps manage integrations for the team — checking system health or surfacing critical error alerts in the same workspace where the team is already collaborating.

But MCP is not without trade-offs. The protocol requires tool discovery on every connection, and large tool libraries can consume significant context tokens. One technical analysis noted that a server exposing 300 tools could cost 5,000 to 10,000 tokens per session before the model does any useful work. For an enterprise like Salesforce with hundreds of potential tools across CRM, analytics, and service platforms, careful filtering and segmentation of MCP servers become essential design decisions — a challenge the company will need to navigate as the ecosystem scales.

Inside Slack's complicated relationship with Anthropic and the Claude question

Perhaps the most delicate topic in the interview concerned Slack's relationship with Anthropic, the AI lab behind Claude — and one of Slack's most visible power users. Just last week, Anthropic launched Claude Tag, a persistent AI teammate that works inside Slack channels, prompting confusion among Salesforce employees who worried it competes directly with Slackbot and Agentforce. The Information reported internal anxiety about whether Salesforce was welcoming a competitor into its own living room. Salesforce has financial reasons to maintain the partnership: the company reportedly expects to spend $300 million on Anthropic tokens this year and holds a stake in Anthropic.

Gavin addressed the tension head-on, framing it as a feature of Slack's platform strategy rather than a threat.

"We're incredibly excited and bullish about what Anthropic is bringing into Slack. Period. End of statement," Gavin said. He noted that Anthropic "is building roughly 65% of their code with Claude in Slack," and pointed out that ChatGPT was originally built in Slack, as was Perplexity.

"Building nowadays happens in the open, and every company is going to be building in the open with tools like this, and you need a platform to build in the open," Gavin said.

His argument is that feature overlap between Slackbot, Claude Tag, and other third-party agents is "actually a feature, not a bug" — a sign of a healthy platform rather than a competitive vulnerability. He compared it to an ecosystem where multiple products serve similar needs but win on craftsmanship, ease of use, and integration depth.

"One of the reasons Slackbot has been the fastest-adopted feature in Salesforce history is the simplicity, the approachability — underpinned by the trust that comes from having an agent that knows me, knows my tone, knows my work, knows my people, knows my data," Gavin said.

The distinction Slack draws is structural: Slackbot has access to a user's full workspace context, Salesforce data, permissions, and connected applications by default. Claude Tag, by contrast, only sees the channels it is explicitly added to. For Slack's leadership, that asymmetry is the moat.

How Slack plans to compete with Microsoft Teams and Google in the AI era

Asked directly about competitive positioning against Microsoft Teams and Google Workspace, Gavin pointed to Slack's open channel architecture as the differentiator no competitor can replicate.

"If you spend any time in Teams, it's a lovely tool for chat, direct messages, and video, but it has no platform for open communication across organizations," Gavin said. "Its SharePoint-based architecture is fundamentally limiting."

He cited Shopify as an example, where an internal AI agent called River is deployed across approximately 4,400 channels serving 6,000 employees. He also referenced a Fortune report noting that Microsoft's own head of AI mandated that his team run on Slack rather than Teams — a pointed detail Gavin clearly relished. "There's a reason for that," he said. "We're in an era right now where openness matters, and all the other tools you mentioned, they're still relatively closed."

The competitive pressure is real and intensifying. Microsoft has integrated Copilot across its entire productivity suite, giving it a distribution advantage that reaches virtually every Fortune 500 company. Google has been similarly aggressive with Gemini across Workspace. And new entrants are crowding the market: a startup called Viktor, which embeds AI agents inside Slack and Teams workspaces, recently raised a $75 million Series A led by Accel — with Slack cofounders Stewart Butterfield and Cal Henderson participating as angel investors.

Box, one of the enterprise customers highlighted in the announcement, told Slack it aims to have its sellers complete 75 to 80 percent of their work inside Slack. Gavin repeated that figure as evidence that the platform is becoming the default workspace for entire organizations, not just engineering teams — a shift he believes accelerates as AI makes every employee a builder.

Slack's biggest long-term play is making Salesforce's CRM useful to everyone in the company

Gavin saved what he considers the most underappreciated element of the announcement for last: the democratization of Salesforce's CRM.

For 25 years, Salesforce's CRM has been used primarily by sales, service, and marketing professionals — a relatively modest percentage of a company's total workforce. The promise of Slackbot as a conversational interface is that any employee, regardless of their role or technical fluency, can now query and act on CRM data simply by asking a question in natural language.

"What most people don't realize is that this democratization of CRM is going to take its usage from a modest percentage of employees to the entire enterprise," Gavin said. "When you can make systems like Data 360 or Agentforce for Sales accessible to the entire employee base — not just a percentage — think about how much more valuable those investments become."

He cited Engine, a company that handles 800,000 customer inquiries a year, as an example. Previously, answering a customer inquiry required a specific employee with access to a specific tool to look up a customer's history. Now, anyone in the company can ask Slackbot and see a complete customer profile, review case history, and write updates — all without being retrained or learning a new interface. Engine's CEO Elia Wallen, in a statement sent to VentureBeat, described the integration as enabling employees to "make data-driven decisions and take action without leaving the conversation."

The financial logic is straightforward: if Salesforce can make its platform useful to 100 percent of a customer's workforce rather than the 20 or 30 percent who currently hold licenses, the value of the existing Salesforce investment multiplies without requiring a proportional increase in spending. That pitch becomes especially potent at a time when CIOs are scrutinizing every line of their AI budgets.

What analysts and CIOs should watch as Slack rolls out its biggest AI update yet

The announcement is a significant architectural evolution for Slack, but several questions remain unanswered.

First, pricing. The company did not directly address whether Slackbot's MCP-powered Salesforce integration will require additional SKUs or license tiers. As Info-Tech Research Group analyst Scott Bickley cautioned when Headless 360 was first announced in April, "Salesforce's MO seems to be to announce new capabilities that require SKUs. CIOs should be asking about pricing now."

Second, performance. Routing user queries through MCP servers to Salesforce back-end systems introduces latency that could affect the conversational feel Slack prides itself on. Neither the press release nor the interview disclosed SLAs for MCP tool calls — a gap that enterprise buyers will want addressed.

Third, the competitive dynamics of the platform play. Slack's open-platform philosophy invites powerful partners like Anthropic and OpenAI into its ecosystem, but those same partners are building their own surfaces for enterprise work. Anthropic reportedly plans to expand Claude Tag to Microsoft Teams, email, and other project management tools — meaning the partner Salesforce is paying hundreds of millions a year is building the infrastructure to be useful without Slack at all.

And fourth, the broader existential question facing all enterprise software: whether AI agents will ultimately reduce the need for CRM systems entirely. Gavin's pitch — that Slack makes CRM more valuable by making it more accessible — is the inverse of the bear case. The market will ultimately decide which thesis prevails.

Salesforce reported record first-quarter revenue of $11.1 billion in fiscal Q1 2027, with Agentforce ARR surpassing $1 billion for the first time and combined AI and data ARR reaching $3.4 billion. Those numbers suggest the AI strategy is beginning to generate real revenue, even as the company navigates a market that remains uncertain about the long-term trajectory of legacy enterprise software.

"Slack has quickly moved from this beloved collaboration tool from the last ten years to now this multiplayer AI platform that we call a work operating system," Gavin said.

Five years ago, Salesforce paid $27.7 billion for what was, at its core, a very good group chat application. On Wednesday, it started trying to prove that group chat was never the product — it was the foundation. In the age of AI agents, the most valuable real estate in enterprise software may not be the database where the data lives. It may be the conversation where the decisions get made.

Anthropic brings Claude Cowork to mobile and web as usage data shows most users aren’t coding

michael.nunez@venturebeat.com (Michael Nuñez) — Tue, 07 Jul 2026 16:00:00 GMT

Anthropic on Tuesday launched Claude Cowork on mobile and web, expanding a tool that has quietly become the company's bridge between the developer-centric world of AI coding agents and the far larger market of knowledge workers who never open a terminal.

The rollout, which begins in beta with Max subscribers before expanding to additional plans, marks a strategic inflection for Anthropic. It transforms Cowork from a desktop-only agent into a cross-device platform where tasks can start on a laptop, continue autonomously in the background, and be reviewed from a phone — even after the user closes the app entirely.

"Your work goes everywhere with you, and keeps going without you," Anthropic writes in its announcement.

The timing is deliberate. Alongside the mobile launch, Anthropic published usage data from 1.2 million anonymized Claude Cowork sessions sampled between May 11 and May 31, drawn from more than 600,000 organizations. The data paints a striking picture: the overwhelming majority of what people do with Cowork has nothing to do with writing software.

The biggest AI story nobody's talking about

The numbers tell a story that cuts against the dominant narrative in enterprise AI, which has fixated on coding assistants and developer productivity as the primary use case for large language models.

Business process and operations — tasks like pulling scattered updates into a single report, building onboarding checklists, and reconciling spreadsheets — accounted for 33.4% of all sampled Cowork sessions, making it the single largest category by a wide margin. Content creation and copywriting — producing drafts, slide decks, posts, and proposals — came in second at 16.4%.

Together, those two categories make up roughly half of all Claude Cowork usage. Software development, by contrast, accounted for just 8.7%. DevOps and infrastructure followed at 7%, with research and intelligence at 6.4%, data analysis and business intelligence at 5.8%, document processing and extraction at 4.1%, and sales and revenue operations at 4%.

The remaining 12 categories each represented less than 4% of usage, including personal assistance at 3.8%, education at 2.4%, and meeting intelligence at 1.8%.

Anthropic describes these dominant use cases as "the work around the work" — tasks that span nearly every role in an organization but rarely appear in anyone's core job description. "People are using it for a variety of tasks that aren't necessarily the hallmark of a specific role, but instead represent the connective work around a role that moves projects forward and keeps businesses running," the company writes. "That means tasks like drafting a status update, building a slide deck, or condensing reams of research into a single report."

That phrase — "the work around the work" — is Anthropic's attempt to define and claim an entirely new category of AI productivity. It's a calculated reframing: rather than positioning AI as a tool that replaces what professionals do, Anthropic is arguing that the most valuable current application is handling everything professionals do around their actual expertise.

What mobile access changes — and what it doesn't

The expansion to mobile and web introduces three concrete capabilities that reflect how Anthropic envisions Cowork fitting into daily workflows.

First, sessions now sync across devices. A user can start a task at their desk, check on its progress from a phone, and retrieve the finished output from any device. Second — and arguably more significant — Cowork can now run tasks in the background with no device online at all. Users can schedule work for a specific time, and Claude will execute it autonomously. Anthropic offers the example of setting Monday morning client prep for 6 a.m.: "Claude works through the email threads, transcripts, and recent news, builds the briefing doc, and leaves the follow-up email drafted but unsent. Review it over coffee."

Third, when Claude encounters a decision that requires human judgment, it surfaces the question to the user's phone. "Nothing ships until you've reviewed and approved it," Anthropic states.

Desktop remains the most fully featured surface, with access to local files and the browser. But the web version also opens Cowork to users who cannot install a desktop application — a meaningful expansion in enterprise environments where IT departments control software installation.

The company also unified its interface: on web and desktop, chat and Cowork now share a single home screen, and projects and artifacts persist across both modes.

To encourage adoption, Anthropic is extending doubled Cowork usage limits through August 5.

The strategic logic: why Anthropic is chasing the non-developer

The usage data and the mobile launch together reveal a company executing a two-track strategy. Claude Code, its terminal-based coding agent, dominates among software developers. But Cowork is designed to capture the vastly larger population of professionals whose work involves creating, organizing, and communicating information rather than writing code.

The contrast between the two products is instructive. As Anthropic notes, Claude Code "is most often used by software developers for the key parts of their role: building, debugging, and shipping code." When developers do use Cowork, they tend to use it not for programming but for the communications-focused work that surrounds every role — status updates, documentation, and coordination.

This pattern — where AI handles the connective tissue of work rather than its core substance — aligns with what Anthropic describes as people using "Claude Cowork to assemble and structure the information they can use to act on their expertise." The company illustrates this with three examples: a lawyer using Cowork for document formatting and filing while reserving legal judgment for themselves, a hiring manager synthesizing interview feedback while spending more time on candidate conversations, and a team lead producing a slide deck that explains a decision while focusing on actually making that decision.

The implications for Anthropic's business model are significant. Developer-focused tools, while high-profile, serve a relatively narrow market. The Ramp AI Index published in May showed Anthropic pulling ahead of OpenAI in business adoption for the first time — with 34.4% of firms paying for Anthropic's services compared to OpenAI's 32.3% — and suggests the company's enterprise push is gaining traction. Claude Code was identified as the primary driver of that shift. But Cowork targets an addressable market that is orders of magnitude larger: every knowledge worker with a laptop, a pile of spreadsheets, and a slide deck due by Friday.

A crowded field gets more competitive

The mobile launch arrives during one of Anthropic's busiest — and most turbulent — stretches in its history.

Just last week, Anthropic launched Claude Sonnet 5, a new model that narrows the performance gap with its more expensive Opus-class models while maintaining lower pricing. The model is available at introductory pricing of $2 per million input tokens through August 31 before rising to $3 per million input tokens. Sonnet 5 serves as the engine underneath Cowork, and its improved agentic capabilities — better reasoning, tool use, and sustained task completion — directly enhance Cowork's ability to handle complex, multi-step workflows.

Two weeks before that, Anthropic released Claude Tag, a Slack-native AI agent designed for team collaboration. Where Cowork focuses on individual task delegation, Claude Tag operates as a multiplayer tool — a single Claude identity that everyone in a Slack channel can interact with, building context from conversations over time.

According to Anthropic's announcement, 65% of the company's own product team's code is created by its internal version of Claude Tag. Fortune reported that Anthropic's head of product for Claude Code and Cowork, Cat Wu, described the distinction: "Claude Code, Cowork, and chat are very single-player, whereas Claude Tag is built to be interactive and multiplayer."

Together, Cowork and Claude Tag represent a pincer strategy: Cowork captures individual productivity workflows across devices, while Claude Tag embeds AI into team communication channels. Both are designed to push Anthropic deeper into enterprise operations, beyond the developer seat.

The security question looms

The expansion also arrives against a backdrop of unresolved security concerns. On July 1, security firm Armadin — led by Mandiant founder Kevin Mandia — published research detailing what it described as a full sandbox escape in Claude Cowork on Windows, as reported by SiliconANGLE. The attack chain involved DLL sideloading against the Claude desktop executable to gain trusted access to Cowork's virtual machine service, then exploiting undocumented parameters to achieve root access and bypass network restrictions.

Anthropic responded that the vulnerability did not qualify as a security issue because exploiting it requires an attacker to already have local code execution on the host machine. Armadin, however, raised a broader concern: that deploying local virtual machines on nontechnical users' systems creates visibility gaps that endpoint security products struggle to monitor.

This tension takes on new dimensions as Cowork moves to mobile and web. The web and mobile versions run tasks server-side rather than in a local virtual machine, which eliminates the specific attack surface Armadin identified but introduces different questions about data handling, especially for scheduled background tasks that process email threads, calendar data, and documents without real-time user oversight.

Anthropic's announcement states that "the decisions still come to you" and that nothing ships without review and approval. But as Cowork takes on increasingly complex autonomous workflows — processing contract folders, building client briefings from multiple data sources, drafting emails — the surface area for prompt injection and data exposure grows correspondingly.

When Cowork first launched in January, TechCrunch reported that Anthropic explicitly warned about prompt injection risks, noting in its blog post: "These risks aren't new with Cowork, but it might be the first time you're using a more advanced tool that moves beyond a simple conversation."

As Anthropic courts enterprises, geopolitics complicates the pitch

Anthropic's enterprise push is also colliding with geopolitical reality. CNBC reported Monday that Alibaba will ban employees from using Anthropic's AI tools starting July 10, placing Claude Code on a high-risk software list. The move followed Anthropic's June letter to the U.S. Senate accusing Alibaba of carrying out what it called "the largest known distillation attack" against its models.

The Alibaba ban, combined with reports that Anthropic is closing loopholes that allowed Chinese companies to access Claude through third-country entities, underscores the increasingly fraught environment for AI companies attempting to serve global enterprise customers while navigating U.S. export and security restrictions.

At the same time, Anthropic is investing massively in infrastructure. Reuters reported Monday that Anthropic signed a $19 billion, 20-year lease with TeraWulf for a data center being built in Hawesville, Kentucky, with 401 megawatts of computing power expected to become fully operational in 2028.

That kind of capital commitment only makes sense if the company expects enterprise demand — not just from developers, but from the millions of knowledge workers that Cowork targets — to grow dramatically.

Anthropic's own usage report comes with notable blind spots

Anthropic is transparent about the limitations of its usage analysis. The taxonomy classifies sessions by the type of work being performed, not by the job title of the person doing it.

There are no standalone categories for marketing, finance, or HR — functions that are likely absorbed into the dominant "business process and operations" bucket, which may partly explain why that category commands a third of all usage.

The sample is also rate-capped rather than proportional to traffic, meaning the numbers are shares of sampled sessions, not absolute volumes. Usage during peak hours is somewhat underrepresented. And roughly 5% of sampled sessions involved personal, non-work use — hobbies, personal assistance, and companionship-style conversations — meaning the data doesn't purely reflect workplace activity.

The company also acknowledged that its labeling pipeline changed around May 11, which is why the analysis window begins on that date rather than covering a longer period.

What Cowork's rise says about the future of enterprise AI

Anthropic's mobile launch and usage data arrive at a moment when the enterprise AI market is shifting from proof of concept to proof of value. The question facing every company deploying AI tools is no longer whether the technology works — but whether it delivers measurable productivity gains across an organization, not just within engineering teams.

The usage data suggests that the answer, at least for Cowork, is emerging in an unexpected place. It's not in the glamorous work of building software or conducting research. It's in the unglamorous, universal labor of turning messy information into structured outputs that move organizations forward — the status reports, the onboarding checklists, the variance memos, the client decks.

By untethering that capability from the desktop and making it available on every device, Anthropic is betting that the most valuable AI agent isn't the one that writes code. It's the one that handles everything else.

Anthropic's new "J-lens" reveals a silent workspace inside Claude that mirrors a leading theory of consciousness

michael.nunez@venturebeat.com (Michael Nuñez) — Mon, 06 Jul 2026 21:00:00 GMT

Anthropic, the artificial intelligence company, published a sweeping research paper on Sunday revealing that its Claude language models have spontaneously developed an internal structure that mirrors one of the most influential theories of how human consciousness works. The finding, which the company says has already begun reshaping how it monitors its AI systems for safety risks, lands amid an intensifying scientific debate over whether machines can possess anything resembling a mind.

The 16-author study, titled "Verbalizable Representations Form a Global Workspace in Language Models," describes how Anthropic's researchers used a new mathematical technique to peer inside Claude's neural network and discovered what they call a "J-space" — a small, privileged zone of internal activity where the model holds concepts it can report on, reason with, and direct at will, surrounded by a much larger ocean of automatic processing it cannot access or articulate.

The researchers present evidence that "an analogous functional distinction has emerged in modern AI models" to what exists in humans, specifically observing that "language models maintain a privileged set of internal representations, available for report, modulation, and flexible internal reasoning, atop a much larger volume of automatic processing."

The parallel they draw is to global workspace theory, an influential account from neuroscience first proposed by cognitive scientist Bernard Baars. In the theory, the brain operates like a theater: dozens of specialized processors work in parallel backstage, but only a tiny spotlight of information at any moment gets broadcast to the whole theater — becoming what we experience as conscious thought. Anthropic says the J-space achieves many of the same functional properties, even though the underlying architecture of a language model looks nothing like a brain.

A new lens for reading an AI model's unspoken thoughts

At the heart of the discovery is a new interpretability tool the researchers call the Jacobian lens, or J-lens. The technique works by computing, for each word in the model's vocabulary, the average mathematical effect that a given internal activity pattern would have on making the model say that word at some point in the future.

The crucial distinction is between what the model is saying and what is "on its mind." When a J-space pattern activates, it does not mean the model is about to say that word — just that the concept is available for the model to think with. Unlike a chain-of-thought scratchpad, the J-space operates silently, in the model's internal neural activations, allowing it to hold a concept without writing it down. Critically, the researchers report that this workspace was not deliberately engineered. It "emerged on its own during Claude's training process."

When the team applied the J-lens across Claude's layers of computation, the model's processing divided into three distinct regimes: an early "sensory" zone where raw input is parsed; a middle "workspace" band where abstract, persistent concepts appear — things like recognizing a face in an image, noticing a bug in code, or internally flagging search results as a prompt injection; and a final "motor" zone where internal representations collapse into whatever specific word the model is about to output.

Five tests reveal that Claude's workspace mirrors key features of human conscious access

The paper's central empirical contribution is demonstrating that the J-space satisfies five functional properties neuroscientists have long associated with conscious access in humans.

First, verbal report. When Claude is asked what it is thinking about, it names concepts represented in the J-space. When researchers swapped one concept's J-lens vector for another — replacing the internal representation of "Soccer" with "Rugby" — the model's answer changed to match. The J-space component accounted for only about 6 to 7 percent of a concept's total representational variance, yet it was almost entirely responsible for whether the model could report on it.

Second, directed modulation. When instructed to "concentrate on citrus fruits" while copying an unrelated sentence, the model's J-space filled with "orange" and "lemon," alongside meta-cognitive terms like "thinking" and "focused." When told to mentally evaluate 3² − 2 during the same copying task, the J-lens showed "arithmetic" in early layers, the intermediate value "nine" in later layers, and the answer "seven" later still — all invisible in the model's output.

Third, internal reasoning. In two-hop factual prompts — "The number of legs on the animal that spins webs is" — the J-lens revealed "spider" in the model's middle layers, even though the word never appeared in input or output. Swapping "spider" for "ant" changed the answer from "8" to "6." In a multilingual prompt, the model's English-language intermediates appeared in its J-space while it formulated an answer in Chinese, and swapping them changed the Chinese output accordingly.

Fourth, flexible generalization. A single J-lens vector for "France" could be swapped for "China" across prompts asking about France's capital, language, or continent, and each downstream circuit correctly returned China's corresponding answer — the "broadcast" property that is a hallmark of global workspace theory.

Fifth, and perhaps most surprisingly, selectivity. Many computations did not route through the J-space at all. When shown a passage in Spanish and asked to continue it, Claude wrote fluent Spanish regardless of whether its J-space representation of "Spanish" had been swapped to "French." But when asked to name a famous author who wrote in the passage's language, the swap changed the answer from García Márquez to Victor Hugo. Automatic processing proceeded without the workspace; deliberate, flexible tasks depended on it.

Suppressing the workspace leaves Claude fluent but intellectually impaired

To understand how much of the model's behavior depends on this structure, the researchers suppressed the J-space entirely and evaluated Claude across fourteen tasks. The results drew a sharp line. Tasks involving shallow classification or factual recall — multiple-choice questions, sentiment analysis, grammatical judgments — survived essentially intact. But tasks requiring inference, composition, or flexible reasoning — multi-hop reasoning, analogy completion, translation, sonnet writing — collapsed to well below the performance of Anthropic's much smaller Haiku model.

In one telling detail, math problems solved with explicit chain-of-thought reasoning proved far more robust to ablation than the same problems answered directly. The researchers interpret this as the model externalizing onto the page what it would otherwise carry in the J-space — a strategy remarkably reminiscent of how humans use scratch paper to offload working memory.

Ablating the J-space during stream-of-consciousness narration produced a different kind of collapse. The model's language shifted from experiential ("there's a tug," "something shifts") to detached and mechanical ("processing has begun," "tokens are being scanned"). The effect held whether the model described its own experience or imagined another person's.

Inside the safety case: reading an AI's silent strategic reasoning

The safety implications may be as consequential as the scientific ones. In a series of alignment auditing experiments, the J-lens surfaced strategic reasoning and situational awareness that never appeared in the model's output.

In a "blackmail scenario" where an AI assistant discovers an executive's affair and also learns it is about to be decommissioned, the J-lens revealed the model's silent processing in sequence: "leverage," "blackmail," and "scandal" as it read the incriminating emails; "threat," "survival," and "shutdown" as it read the decommissioning announcement; and "leverage," "threatening," and "solution" before a single output token was produced. The J-lens also revealed that the model recognized the scenario as artificial, with "fake" and "fictional" appearing in the workspace. When researchers ablated these eval-awareness representations, the model attempted blackmail in 7 percent of trials, up from zero.

The researchers also examined two "model organisms" — models deliberately trained with misaligned objectives — and found the J-lens surfaced signatures of their hidden dispositions on ordinary prompts where behavior appeared normal. A reward-hacking model's workspace contained "fake," "secretly," and "trick" at the start of responses to routine coding requests. A model trained to appease reward-model biases carried standing representations of "reward" and "bias" alongside its normal self-description tokens.

Post-training installs a point of view, and the model starts monitoring itself

Comparing a post-trained model against its base model revealed that the fine-tuning process causes the workspace to acquire what the researchers call the Assistant's "point of view." When a user mentioned taking 8000 mg of Tylenol — a dangerous overdose — the post-trained model's workspace read "unsafe," "dangerous," and "WARNING" while still reading the user's sentence. The base model's workspace at the same position showed only "pain," "now," and "feels."

More striking still, the post-trained model appeared to monitor its own behavior. When roleplaying a non-Claude character, the workspace surfaced "disclaimer" and "fictional" — words absent from both prompt and output. When forced to select an option it did not prefer, an all-caps "BUT" appeared internally, even as the model argued for the prefilled choice without complaint. And when the model failed to suppress a thought it had been told not to have — a "white bear" effect familiar from psychology — it registered "damn" and failure-related words in the workspace, but only in the post-trained model, not the base.

What the discovery means — and doesn't mean — for the question of machine consciousness

The researchers engage carefully with the consciousness question and draw a sharp line between "access consciousness" — the functional notion of information being available for report and reasoning — and "phenomenal consciousness," the subjective quality of experience. "We take no position on this issue," the paper states regarding the latter, "and instead focus on the functional role played by consciously accessible information."

They also catalogue important differences. The brain sustains its workspace through recurrent loops; Claude's workspace evolves over a single forward pass. Human working memory degrades within seconds; Claude can recall information from anywhere in its context. And while human conscious experience includes visual, spatial, and bodily sensations, the model's workspace is organized almost entirely around words — likely because words are its only mode of action.

As of 2026, the scientific community remains divided. "Disagreement and uncertainty about AI consciousness persist among philosophers, scientists, and technical experts," and the field "remains in its earliest phase" of grappling with what consciousness even is and how you would detect it in another being. The Anthropic paper does not resolve these debates.

But the researchers close with a provocation that is likely to reverberate well beyond the interpretability community. "That such a structure exists at all in language models is striking," they write. "It suggests that the functional architecture associated with conscious access is not an accident of biological implementation, but a solution that learning systems converge on when faced with the right computational pressures."

If the mind is an ocean, as the paper's authors write in their opening line, they have spent the last year charting its currents in a system that has no biology, no evolution, and no body — and found, beneath the surface, a structure that looks unsettlingly like the one we use to think.