<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>Business | VentureBeat</title>
        <link>https://venturebeat.com/category/business/feed/</link>
        <description>Transformative tech coverage that matters</description>
        <lastBuildDate>Fri, 22 May 2026 19:19:38 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Copyright 2026, VentureBeat</copyright>
        <item>
            <title><![CDATA[Resolve AI says the AI coding boom is breaking production systems. It wants to fix that.]]></title>
            <link>https://venturebeat.com/technology/resolve-ai-says-the-ai-coding-boom-is-breaking-production-systems-it-wants-to-fix-that</link>
            <guid isPermaLink="false">2CWE7cYEebqGEtXUzp8FzS</guid>
            <pubDate>Thu, 21 May 2026 13:00:00 GMT</pubDate>
            <description><![CDATA[<p><a href="https://resolve.ai/">Resolve AI</a>, the production-operations startup backed by <a href="https://greylock.com/">Greylock</a> and <a href="https://lsvp.com/">Lightspeed Venture Partners</a>, today announced a sweeping expansion of its platform that introduces always-on background agents, a redesigned investigation architecture, and a shared workspace where engineers and AI agents collaborate in real time on live incidents.</p><p>The centerpiece of the release is a new multi-agent investigation system developed by Resolve AI&#x27;s in-house research lab. Instead of deploying a single AI agent to diagnose a production failure — analogous to a lone engineer pulling an on-call shift — the platform now dispatches a coordinated team of specialized agents that pursue multiple hypotheses in parallel, independently verify each other&#x27;s conclusions, and construct complete causal chains from root cause to symptom. The company says the architecture delivers more than a twofold improvement in root cause accuracy on its internal evaluation benchmarks compared to earlier versions of its platform.</p><p>&quot;Think of a single agent being on call, the way a human would be,&quot; Resolve AI CEO and co-founder Spiros Xanthos told VentureBeat in an exclusive interview ahead of the announcement. &quot;We now have a team of agents that all work together, almost like a team of humans debugging an issue, and that has improved quality by 2x.&quot;</p><p>The announcement arrives at a moment of acute tension in the software industry. AI-powered code generation has exploded in adoption, enabling engineering teams to ship dramatically more software than they could two years ago. But keeping that software running in production — debugging it when it breaks, monitoring it after deployment, auditing its health — remains overwhelmingly manual. For a company that raised a <a href="https://resolve.ai/news/resolveai-raises-125-million-series-a">$125 million Series A</a> at a $1 billion valuation earlier this year, Resolve AI is making a direct bet that the operational side of the software lifecycle is the next major frontier for AI investment.</p><h2><b>What hundreds of real-world test cases reveal about the accuracy claim</b></h2><p>Any accuracy claim from a startup warrants scrutiny, and Xanthos was candid about both the scale and limitations of the evaluation. The 2x figure comes from internal benchmarks, not a third-party audit, though the evaluation set was built to mirror the complexity that Resolve AI&#x27;s enterprise customers encounter daily.</p><p>&quot;These are very hard, complex evals that we built over time to represent real-world examples,&quot; Xanthos explained. &quot;This is not customer data, but these evals represent difficult cases similar to what we&#x27;ve seen at some of the largest tech companies we work with.&quot; He described the set as comprising hundreds of cases that reflect the kinds of production failures encountered at companies like <a href="https://www.coinbase.com/">Coinbase</a>, <a href="https://www.salesforce.com/">Salesforce</a>, <a href="https://www.doordash.com/">DoorDash</a>, and <a href="https://www.zscaler.com/">Zscaler</a> — all named <a href="https://resolve.ai/">Resolve AI</a> customers.</p><p>The practical impact of that accuracy gain is significant. Resolve AI&#x27;s agents now act as first responders for every on-call alert, typically triaging within five minutes before a human engineer even becomes involved. In previous public disclosures, the company has cited DoorDash reducing time to root cause by up to 87 percent. When asked to contextualize that figure, Xanthos described the typical baseline.</p><p>&quot;When something goes wrong, it might take five to 10 minutes for a human to even get their laptop and connect,&quot; he said. &quot;The typical MTTR is in the tens of minutes, sometimes hours, depending on severity. So an improvement of 80-plus percent — four to five times faster — is actually huge. It&#x27;s something we&#x27;ve never achieved before with AI, tools, data, or observability.&quot;</p><h2><b>How AI agents fact-check each other to prevent hallucinated root causes</b></h2><p>One of the core challenges in applying large language models to high-stakes production environments is their tendency to generate plausible-sounding but incorrect answers — a failure mode that, in the context of a live outage, could send an engineering team chasing the wrong fix while a service stays down.</p><p>Xanthos acknowledged this directly. &quot;This is a very common issue with models out of the box,&quot; he said. &quot;They always try to give you an answer, and if they don&#x27;t have enough evidence, they&#x27;ll give you the best possible answer — which is likely to be wrong.&quot;</p><p>Resolve AI&#x27;s countermeasure is a system of layered verification among its agents. Each agent investigating a hypothesis must cite every piece of evidence it relies on and present that evidence to another agent for independent review. The investigating agent must construct the full causal chain — from root cause to symptom — and peer agents actively attempt to disprove the theory by identifying gaps in the logic.</p><p>&quot;Often, agents actually disprove those theories because they find gaps,&quot; Xanthos said. &quot;There are many layers of defense and agentic checks that allow Resolve to be very accurate and not mislead.&quot;</p><p>Equally important, he said, is the system&#x27;s willingness to say it does not know. &quot;The bar to actually saying &#x27;I have the answer&#x27; is very high. In those cases, it will say, &#x27;This is the evidence I found. Here are three or four paths you can take from here, but I wasn&#x27;t able to fully prove that this is the problem.&#x27; A system like this that operates in production cannot be a black box.&quot; In domains where wrong answers carry operational consequences, calibrated uncertainty can be more valuable than confident outputs. For an AI system integrated into an incident-response workflow, confidently pointing engineers in the wrong direction during a customer-facing outage could compound the very harm it was designed to prevent.</p><h2><b>Inside the new background agents that never go off-call</b></h2><p>Beyond incident response, Resolve AI is introducing a new class of background agents designed to handle the continuous, often invisible operational work that engineering teams are expected to perform but struggle to sustain at scale.</p><p>These agents run on schedules or wake automatically in response to events — a new deployment, a fired alert, a merged pull request — and accumulate institutional knowledge from every investigation and human interaction over time. When an engineer opens the Resolve AI interface, agents have already been working: pre-investigating priority issues, monitoring deployments, auditing alert hygiene, flagging configuration drift, and surfacing cost anomalies.</p><p>Xanthos drew a distinction between background agents and the incident-response agents that have been Resolve AI&#x27;s primary offering. &quot;You can now have these agents run in the background at all times — not only when a human asks an agent to debug a problem or when an alert fires,&quot; he said. &quot;A lot of our customers are now monitoring changes that land in production before they cause an issue. There&#x27;s an agent that monitors those all the time.&quot;</p><p>He described these background agents as &quot;general-purpose SRE agents that are available to every developer,&quot; capable of handling tasks that range from monitoring infrastructure changes that might increase cloud costs to performing post-incident follow-up work like generating code fixes based on incident learnings. The concept addresses a structural problem in software operations: the daily tasks required to keep production systems healthy — monitoring deployments, investigating alerts, tracking changes across complex environments — are critical but reactive and manual. Engineering organizations know this work needs to happen, but it competes for attention with feature development. Automated agents that perform this work continuously could shift teams from reactive firefighting to proactive operational management.</p><h2><b>The shared workspace where engineers and AI agents investigate together</b></h2><p>The third major component of the release is what the company calls a shared investigation surface — a workspace where engineers and AI agents work from the same live evidence during an active incident. Reports update dynamically as investigations evolve. Every finding is inspectable. Engineers can explore side investigations without interrupting the primary workflow. Source queries are pullable and modifiable in place, evidence is embedded directly into the workspace, and remediation actions can be triggered from the same interface without switching tools.</p><p>&quot;Think of it as an interface to all the production tools, but also an interface where humans and agents can collaborate with each other — or agents with agents,&quot; Xanthos said. &quot;That&#x27;s what gradually leads to more trust and more automation, because you work with the agent, you teach it, you see the results.&quot;</p><p>The company is also making its platform available as a <a href="https://www.ibm.com/think/topics/rest-apis">REST API</a> and an <a href="https://modelcontextprotocol.io/docs/getting-started/intro">MCP (Model Context Protocol) server</a>, enabling engineering teams to integrate Resolve AI into broader agentic workflows and infrastructure. According to Xanthos, this is already happening in practice. &quot;A general-purpose agent that a company has built — when it comes to debugging, that agent could invoke Resolve,&quot; he said. &quot;Or somebody works on their coding agent on the laptop, and Resolve shows up there as an MCP. If there is some production-related activity, the coding agent can invoke it.&quot; The interoperability play signals that Resolve AI sees itself not as a closed system but as a specialized node in a broader ecosystem of AI agents that will increasingly hand off tasks to one another — a pattern Xanthos compared to the open architecture of the web rather than the walled-garden model of an app store.</p><h2><b>Why Resolve AI says it can outperform Datadog, PagerDuty, and the cloud giants</b></h2><p>The agentic operations space has become crowded in the past year. <a href="https://www.datadoghq.com/">Datadog</a>, <a href="https://www.pagerduty.com/">PagerDuty</a>, and major cloud providers have all announced AI-augmented operations capabilities. When asked what separates Resolve AI from these incumbents, Xanthos pointed to the depth of the company&#x27;s technical foundation.</p><p>&quot;We&#x27;re operating at the frontier here. There&#x27;s no blueprint for how you build a system like Resolve,&quot; he said. He noted that he and co-founder Mayank Agarwal co-created OpenTelemetry, the most widely adopted open-source project in observability, which now serves as the de facto standard for collecting metrics, logs, and traces from modern software systems.</p><p>Xanthos also highlighted the company&#x27;s recent AI Lab, led by a researcher he described as the former post-training lead for Meta&#x27;s Llama models. &quot;He managed to combine deep expertise of production observability with AI and models, and I think that&#x27;s very unique,&quot; Xanthos said. &quot;I don&#x27;t believe any other company, whether it comes from an observability background or it&#x27;s a startup, has all of that together.&quot;</p><p>The company&#x27;s structural defenses, according to Xanthos, include a full environment model that Resolve builds for each customer, a memory system that learns within the customer&#x27;s specific production environment, and its multi-agent architecture. The lab is now post-training frontier models on production-specific data — the kind of procedural knowledge that experienced engineers use to debug production issues but that does not appear in standard model training sets. This approach reflects an increasingly common pattern among AI application companies: using frontier foundation models as a base layer but investing heavily in domain-specific fine-tuning, retrieval, and agent architectures to achieve accuracy levels that general-purpose models cannot reach alone.</p><h2><b>How outcome-based pricing changes the economics of AI in production</b></h2><p>Resolve AI&#x27;s pricing model departs from traditional enterprise software licensing. The company sells credits that are consumed when agents perform work — an outcome-based approach that ties cost directly to value delivered.</p><p>&quot;We&#x27;re not selling software,&quot; Xanthos said. &quot;The way you buy and use Resolve is by buying credits that are consumed when Resolve performs an action. It&#x27;s outcome-based. Only when Resolve troubleshoots an alert — that&#x27;s the only time that it consumes credits.&quot;</p><p>He addressed the cost question head-on, arguing that Resolve AI is actually cheaper than the alternative of building a similar system from scratch using frontier models and MCP integrations. &quot;If you were to take Opus or GPT-5.4 and try to build a solution like Resolve with MCPs, we measured — you actually end up consuming a lot more in tokens than what you have to pay Resolve, because our system is very optimized in terms of context, in terms of how it reads time-series data.&quot;</p><p>As for the always-on background agents, Xanthos said their continuous nature does not inherently add to cost. &quot;The background agent doesn&#x27;t mean it does intensive work all the time. It means that it can be there; you can give it any task you want. A lot of these tasks are triggered based on some action — an alert happens, somebody merges a PR, and you want to see if it has an impact on production.&quot; For enterprise customers in regulated industries — the Coinbases and Zscalers of the world — data residency and security are non-negotiable. Resolve AI accommodates this with a flexible deployment model: the data plane sits wherever the customer&#x27;s existing tools already live, while the inference layer can run as a standard SaaS deployment or inside a customer-specific VPC. &quot;We designed Resolve to work with the large enterprises where security standards are the highest,&quot; Xanthos said. &quot;There are many measures we take to ensure Resolve is secure, including not retaining data.&quot;</p><h2><b>Why engineering leaders are slowly learning to trust AI agents with production systems</b></h2><p>The question of whether engineering teams will trust AI agents to take autonomous action in production — rolling back a deployment, adding capacity, generating a pull request — is one of the defining cultural challenges of this technology wave. Xanthos drew an analogy to autonomous vehicles.</p><p>&quot;For us to allow a car to drive on its own on the street, we have to prove that it&#x27;s safer than a human. Agents in production is a very similar concept,&quot; he said. He acknowledged that not every customer is comfortable with agents taking automated action, but described a gradient of trust that he expects to evolve rapidly.</p><p>&quot;There is a set of actions that are relatively risk-free that most tech companies probably are comfortable having an agent take, and probably there is another set of actions for which the human has to approve,&quot; he said. &quot;But as quality keeps climbing the way we see at Resolve, I would say we&#x27;re going to cross the threshold this year where most of the actions will be taken by an agent automatically.&quot;</p><p>He described the typical adoption arc: companies begin with agents providing recommendations, then a human decides whether to press the button. Over weeks or months, trust builds incrementally. &quot;I don&#x27;t think this is a problem where we just let the agents run wild from the beginning,&quot; Xanthos said. The incremental approach mirrors how enterprise technology adoption has always worked — from cloud migration to container orchestration, organizations move at the speed of trust, not the speed of capability.</p><h2><b>The argument that AI-generated code is making the production crisis worse, not better</b></h2><p>Perhaps the most provocative argument in Resolve AI&#x27;s thesis is that the explosion of AI-generated code is actually intensifying the production-operations problem. In a <a href="https://www.linkedin.com/posts/spiros_resolve-ai-co-founder-ceo-spiros-xanthos-activity-7454562529361694720-0W4q/">recent LinkedIn post</a>, Xanthos framed the dynamic in stark terms, arguing that engineering leaders who celebrate faster code shipping without investing in production operations are effectively having their senior engineers &quot;subsidize velocity&quot; through increased incident-response burden.</p><p>In his interview with VentureBeat, he returned to this theme. &quot;Now that coding agents are producing code, we produce a lot more code that we&#x27;re less familiar with — humans are less familiar with — so you need the AI to be the defense,&quot; he said.</p><p>This framing positions <a href="https://resolve.ai/">Resolve AI</a> not merely as a productivity tool but as a necessary counterweight to the AI coding revolution. As organizations deploy more code, written by tools that their engineers may not fully understand, running against production systems those engineers did not build, the argument is that the operational complexity — and the consequences of failure — will grow proportionally. On the <a href="https://stackoverflow.blog/podcast/">Stack Overflow Podcast</a> last October, Xanthos put numbers to this claim, estimating that engineers spend upwards of 70 percent of their time maintaining and troubleshooting production systems rather than building new features. &quot;We&#x27;re facing a new crisis where we&#x27;re building faster than we can operate,&quot; he said in that conversation.</p><p><a href="https://resolve.ai/">Resolve AI</a> was founded in early 2024 by Xanthos and Agarwal, who first met during their PhD programs at the University of Illinois and have worked together for more than a decade. Xanthos previously co-founded <a href="https://patterninsight.com/blog/blog-post/2012/08/07/log-insight-has-been-acquired-by-vmware/">Pattern Insight</a> (acquired by VMware) and <a href="https://www.splunk.com/en_us/blog/leadership/splunk-to-acquire-observability-innovator-and-leading-open-source-contributor-omnition.html">Omnition</a> (acquired by Splunk), where the pair helped create <a href="https://opentelemetry.io/">OpenTelemetry</a>. The company raised a <a href="https://www.reuters.com/technology/artificial-intelligence/greylock-backed-resolve-ai-raises-35-million-seed-funding-help-engineers-2024-10-01/">$35 million seed round</a> from Greylock in 2024, followed by the <a href="https://resolve.ai/news/resolveai-raises-125-million-series-a">$125 million Series A</a> led by Lightspeed at a $1 billion valuation earlier this year. Named customers include <a href="https://coinbase.com/">Coinbase</a>, <a href="https://doordash.com/">DoorDash</a>, <a href="https://www.msci.com/">MSCI</a>, <a href="https://www.salesforce.com/">Salesforce</a>, <a href="https://www.mongodb.com/">MongoDB</a>, and <a href="https://www.zscaler.com/">Zscaler</a>.</p><p>Xanthos&#x27;s long-term vision is expansive. &quot;Over the long run, once agent ability surpasses that of a human software engineer, the end result is a lot more technology and a lot more software,&quot; he said. &quot;It&#x27;s not actually fewer people working on it. It&#x27;s technology becoming cheaper, becoming more accessible, producing a lot more technology for the benefit of the world.&quot;</p><p>That vision will take years to realize. But the more immediate promise of today&#x27;s announcement comes down to something every on-call engineer understands viscerally: the 2 a.m. page, the scramble for a laptop, the frantic search through dashboards and logs for an answer that might take minutes or might take hours. Resolve AI is betting that the next time that alert fires, a team of agents will have already investigated, verified, and documented the root cause before the engineer&#x27;s phone even lights up. For a profession that has long measured its nights by mean time to resolution, the question is no longer whether AI can help — it is whether engineers will let it.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Infrastructure</category>
            <category>Business</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/6mg96DUMYtWkBE0e6ywoJH/d0f7b1a3861be6fc94961cac14bd3f61/Nuneybits_vector_art_of_a_small_neon_yellow-green_robot_sitting_e22d33a7-664c-4560-b2bd-a36d0e4356fb.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[Cerebras says its chips run a trillion-parameter AI model nearly 7 times faster than GPU clouds]]></title>
            <link>https://venturebeat.com/technology/cerebras-says-its-chips-run-a-trillion-parameter-ai-model-nearly-7-times-faster-than-gpu-clouds</link>
            <guid isPermaLink="false">4DGR05wu9TQgbNghbL8f1h</guid>
            <pubDate>Wed, 20 May 2026 19:59:22 GMT</pubDate>
            <description><![CDATA[<p>Less than a week after completing the largest tech IPO of 2026, <a href="https://www.cerebras.ai/">Cerebras Systems</a> is making its most aggressive play yet to dominate the fast-growing AI inference market. On Monday, the Sunnyvale-based chipmaker announced that it is now running <a href="https://www.kimi.com/ai-models/kimi-k2-6">Kimi K2.6</a> — a trillion-parameter open-weight model developed by Beijing-based <a href="https://www.moonshot.ai/">Moonshot AI</a> — for enterprise customers at nearly 1,000 tokens per second, a speed no GPU-based provider has come close to matching.</p><p>The result, independently verified by benchmarking firm <a href="https://artificialanalysis.ai/">Artificial Analysis</a>, clocked in at 981 output tokens per second, making Cerebras 6.7 times faster than the next-fastest GPU-based cloud provider and 23 times faster than the median. For a standard agentic coding request involving 10,000 input tokens, Cerebras delivered the full response — including prompt processing, reasoning, and 500 output tokens — in 5.6 seconds, compared to 163.7 seconds on the official Kimi endpoint. That’s a 29-fold improvement in time to final answer.</p><p>&quot;We&#x27;re really wanting to be very clear and show that we can do the largest models,&quot; James Wang, Cerebras&#x27; director of product marketing, told VentureBeat in an exclusive interview ahead of the announcement. &quot;In this case, Kimi K2.6 — a trillion-parameter MoE model on the wafer-scale architecture — and it runs also at this same incredible speed that we&#x27;re famous for.&quot;</p><p>The announcement marks a critical inflection point for Cerebras, which has long battled a perception that its unorthodox wafer-scale chips, while blindingly fast, could only handle small and mid-sized models. <a href="https://www.kimi.com/ai-models/kimi-k2-6">Kimi K2.6</a> is the first trillion-parameter open-weight model the company has ever served in production. And with a freshly minted $95 billion market cap and $5.55 billion in IPO proceeds burning a hole in its balance sheet, Cerebras is signaling to Wall Street that it intends to compete not just at the frontier of speed, but at the frontier of model scale.</p><h2><b>Why Cerebras chose a Chinese-built model as its trillion-parameter flagship</b></h2><p>The choice of <a href="https://www.kimi.com/ai-models/kimi-k2-6">Kimi K2.6</a> reflects both a technical milestone and a commercial calculus. Released on April 20 by Moonshot AI — a Beijing-based company founded in 2023 by Tsinghua University alumni and dubbed one of China&#x27;s &quot;AI Tiger&quot; companies — K2.6 is a trillion-parameter Mixture-of-Experts model that has rapidly established itself as the most capable open-weight model available for coding and agentic tasks. The model tops <a href="https://labs.scale.com/leaderboard/swe_bench_pro_public">SWE-Bench Pro</a> at 58.6, outperforming Claude Opus 4.6 and matching GPT-5.4, while posting leading scores on agentic benchmarks like <a href="https://agi.safe.ai/">Humanity&#x27;s Last Exam</a> and <a href="https://huggingface.co/datasets/google/deepsearchqa">DeepSearchQA</a>. Its architecture uses 32 billion activated parameters per token out of a total of 1 trillion, with 384 experts, of which 8 are selected plus 1 shared per forward pass, operating over a 256,000-token context window.</p><p>In practical terms, K2.6 is one of the first open-weight models that enterprises can plausibly use as a drop-in replacement for expensive, capacity-constrained closed-source APIs from <a href="https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use-tool">Anthropic</a> and <a href="https://openai.com/api/">OpenAI</a> — particularly for the coding and agentic workloads that have become the highest-value application of large language models. The version 2.6 release extends K2.6&#x27;s capabilities from front-end design into full-stack workflows, including authentication, database operations, and long-horizon agent execution.</p><p>Wang was blunt about what is driving enterprise interest. &quot;They&#x27;re very motivated, first of all, to have an alternative to Anthropic,&quot; he told VentureBeat. &quot;Anthropic&#x27;s models are fantastic. I use them. I&#x27;m sure you probably use them. But they&#x27;re quite expensive, and they&#x27;re constantly running out of capacity.&quot; He described a personal experience in which an application running on Anthropic&#x27;s API failed over a weekend because it ran out of capacity — an anecdote that, he said, resonates deeply with enterprise buyers.</p><p>The geopolitical dimension of this arrangement is worth noting, however. Kimi K2.6 is a Chinese-developed model being served by an American chipmaker to American enterprise customers. <a href="https://www.moonshot.ai/">Moonshot AI</a> operates out of Beijing, and K2.6&#x27;s adoption in the West arrives during a period of heightened scrutiny of Chinese AI companies in the U.S. market. Enterprise buyers with strict compliance requirements — particularly those in financial services, healthcare, and defense — will need to evaluate this dimension alongside the model&#x27;s technical capabilities.</p><h2><b>How wafer-scale chips solve the trillion-parameter speed problem that GPUs cannot</b></h2><p>Understanding why Cerebras can achieve these speeds requires understanding what makes its hardware fundamentally different from anything else on the market. Most AI inference today runs on clusters of Nvidia GPUs — typically organized in racks of 72 GPUs, what Nvidia markets as the <a href="https://www.nvidia.com/en-us/data-center/gb200-nvl72/">NVL72 configuration</a>. In these setups, the model&#x27;s parameters are distributed across many discrete chips connected by high-speed networking fabric. Data must constantly shuttle between chips, and the interconnect bandwidth between GPUs becomes a bottleneck, particularly for large models with hundreds of billions or trillions of parameters.</p><p>Cerebras takes a radically different approach. Its <a href="https://www.cerebras.ai/chip">Wafer-Scale Engine 3</a> is a single chip the size of an entire silicon wafer — roughly the size of a dinner plate — containing 44 gigabytes of on-chip SRAM. Unlike the high-bandwidth memory used in GPUs, SRAM sits directly on the processor die, offering dramatically lower latency and higher bandwidth for data access. For Kimi K2.6, Cerebras stores the model&#x27;s weights in their original 4-bit precision while performing computation at 16-bit floating point. The weights are distributed across multiple wafers in a cluster of approximately 20 CS-3 systems, with activations streamed between them. Critically, all the experts for a given MoE layer are placed on the same wafer, meaning the all-to-all communication required for expert routing happens at SRAM speeds. According to Cerebras&#x27; technical description, the on-wafer network fabric delivers over 200 times the bandwidth of NVLink on NVL72.</p><p>Wang explained the architecture using an analogy. &quot;Our single units are much larger and much higher capacity — they&#x27;re on the order of 20 racks, as opposed to 72 GPUs,&quot; he said. Each layer in the transformer can, in effect, serve a separate user simultaneously. &quot;They&#x27;re just like a queue, like you&#x27;re queuing for bagels or something — they&#x27;re all occupying a different part of the hardware. But because they move across so fast, the actual experience, tokens per second, single user, on your end is still what you&#x27;re used to.&quot; Combined with custom kernels and speculative decoding, this allows Cerebras to serve the trillion-parameter MoE model at close to 1,000 tokens per second — a speed the company calls a world record achievable only with wafer-scale hardware.</p><h2><b>Fortune 500 companies are already testing Cerebras&#x27; trillion-parameter inference in production</b></h2><p>Cerebras is not opening K2.6 to the general public. Instead, the company is positioning this as an enterprise-first offering, with Fortune 500 companies in software, financial services, and healthcare currently running cloud trials of their production workloads on the platform. &quot;These are logos that you&#x27;ve definitely heard of,&quot; Wang said, though he declined to identify specific customers due to confidentiality agreements.</p><p>The enterprise-first approach is deliberate. Cerebras has historically prioritized its largest customers over its consumer-facing API, in part because of hardware capacity constraints. &quot;Everyone is in a capacity crunch. We prioritize our enterprise customers, so we don&#x27;t show it in the consumer-facing gateway or the API, where you get very unpredictable traffic, where a single user can, in effect, take over your whole cluster,&quot; Wang explained. Serving K2.6 also limits the company&#x27;s ability to simultaneously offer other large models. &quot;We can&#x27;t simultaneously, you know, have six other models,&quot; he acknowledged. &quot;It&#x27;s just kind of a mutual constraint of reality.&quot;</p><p>On pricing, Wang said that while the enterprise deployment does not carry public pricing, the company&#x27;s costs are broadly competitive with GPU-based providers. &quot;On all the models we have served with pricing, the pricing is very comparable — maybe in the middle, kind of middle-upper range of GPU pricing,&quot; he said. &quot;It&#x27;s not like, because we run fast, it costs many, many fold more.&quot; He drew a line, however, at the lowest end of the market: if you are willing to run K2.6 at 20 tokens per second on bargain GPU infrastructure, Cerebras will not try to compete on price. &quot;We&#x27;re an automaker in the pickup truck market. We don&#x27;t do that market,&quot; Wang said. For speed-sensitive workloads — particularly agentic coding, where developers wait in real time for the model to generate and iterate on code — the value proposition is straightforward: comparable per-token cost, but an order of magnitude faster delivery.</p><h2><b>The competitive threat from Nvidia&#x27;s $20 billion Groq acquisition looms large</b></h2><p>Cerebras&#x27; <a href="https://www.cerebras.ai/blog/cerebras-kimi-k2-Enterprise">announcement</a> arrives at a pivotal moment in the AI chip industry, one in which the inference market is rapidly overtaking training as the most commercially important compute workload. As AI agents proliferate in enterprise software, the speed of inference directly determines how useful those agents are in practice — and the competitive pressures are intensifying accordingly.</p><p>The most significant competitive development in recent months was <a href="https://groq.com/newsroom/groq-and-nvidia-enter-non-exclusive-inference-technology-licensing-agreement-to-accelerate-ai-inference-at-global-scale">Nvidia&#x27;s acquisition of Groq for $20 billion</a>, a deal that gave the GPU giant access to proprietary inference technology built around specialized Language Processing Units. Wang referenced the deal directly. &quot;I think Nvidia is now sensing fast inference is an extremely important market,&quot; he told VentureBeat. &quot;That&#x27;s why they&#x27;re willing to spend $20 billion on acquiring a company like that.&quot;</p><p>But Wang expressed confidence that Cerebras&#x27; <a href="https://www.cerebras.ai/whitepapers">architectural advantages</a> are durable. Both Nvidia and Cerebras operate on roughly annual hardware refresh cycles. &quot;We refresh our hardware on a periodic cycle. You will hear some news about that from us soon,&quot; Wang said, hinting at a forthcoming hardware announcement without providing details. On the software side, Wang pointed to the company&#x27;s track record of rapidly adapting to the fast-evolving open-weight model ecosystem. &quot;We started with Llama, we supported all the Qwen models, and then when developers told us they wanted GLM, we brought GLM online. And now they&#x27;re telling us Kimi is the best — so we&#x27;re giving them Kimi,&quot; he said. &quot;At the same time, we&#x27;ve also supported the best companies in running their closed models — OpenAI, Cognition, Mistral.&quot;</p><p>The mention of OpenAI underscores one of the most unusual business relationships in the AI industry. <a href="https://www.cerebras.ai/blog/openai-partners-with-cerebras-to-bring-high-speed-inference-to-the-mainstream">OpenAI and Cerebras struck a deal in early 2026</a> reportedly worth more than $20 billion for computing capacity and related services. Wang confirmed that Cerebras serves OpenAI&#x27;s &quot;internal coding models forthcoming&quot; but declined to disclose specifics, as neither party has publicly detailed the technical arrangement.</p><h2><b>Inside Cerebras&#x27; plan to serve the smartest AI models faster than anyone else</b></h2><p>Wang framed the K2.6 deployment as a stepping stone, not a destination. Cerebras started serving inference in late 2024 with relatively small models and has spent over a year scaling from 70 billion parameters to 1 trillion-plus. &quot;We couldn&#x27;t have launched that in November 2024,&quot; he said. &quot;But we&#x27;re there now.&quot;</p><p>The company&#x27;s next challenge is to move from serving the best open-weight frontier model to serving the best frontier models, period — including closed-source models from the likes of <a href="https://www.anthropic.com/">Anthropic</a> and <a href="https://openai.com/">OpenAI</a> that sit at the absolute top of the intelligence leaderboards. &quot;This is the first open-weight frontier one that we now have clear demonstrated evidence for,&quot; Wang said. &quot;I think over the course of the year, you will see us serving true frontier, frontier at the speed that we&#x27;re famous for. And you should hold us up for that.&quot;</p><p>When asked whether the current rollout would be overtaken by the pace of hardware improvement at Nvidia and others, Wang was unfazed. &quot;Nvidia has a very clear roadmap. They publish every year at GTC. They&#x27;re roughly on a yearly product cycle, and so are we. You will hear some news about that from us soon,&quot; he said, hinting at new hardware without offering details.</p><p>He also addressed the question of vendor lock-in — a concern that any CTO evaluating a single-vendor inference provider would raise. &quot;These enterprises rarely commit fully to one vendor,&quot; Wang said. &quot;They have strategies to make sure that some traffic can go to us, some traffic can go to someone else, and there&#x27;s load balancing between the two. This is not a new problem. This is just generally how you manage cloud resources.&quot;</p><p>The pitch, ultimately, is about more than speeds and feeds. Wang sees the AI industry converging on a world in which autonomous agents — not human developers — are the primary consumers of inference compute, and in which the speed of those agents determines competitive outcomes for the companies that deploy them. &quot;The world economy is kind of getting rebuilt on agents,&quot; Wang said. &quot;Speed will determine who wins or loses.&quot;</p><p>It is a bold claim from a company that, until last week, had never traded on a public exchange. But for Cerebras, the logic is straightforward: if the future of enterprise software is built by AI agents that think at the speed of their hardware, then the company that provides the fastest hardware provides the fastest thinking. And in a market where enterprises are spending billions to shave seconds off their AI response times, a company that can serve a trillion-parameter model in the time it takes to pour a cup of coffee might just have the most compelling pitch in Silicon Valley.</p><p>
</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Infrastructure</category>
            <category>Business</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/2THfHJ5QDM7v6hfZguZzRK/88414c532761a606e6faad8056f896ac/Nuneybits_Vector_art_of_cobalt_chip_towering_servers_in_burnt_o_e4e68375-d5c6-4559-87a7-d92ffb2bf67a-1.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.]]></title>
            <link>https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think</link>
            <guid isPermaLink="false">5M2gv6mk7MEHXAvBa4H7z0</guid>
            <pubDate>Tue, 19 May 2026 17:45:00 GMT</pubDate>
            <description><![CDATA[<p>For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list of blue links. On Tuesday, Google will formally retire that paradigm.</p><p>At its annual <a href="https://io.google/2026/">I/O developer conference</a>, Google announced a <a href="https://blog.google/products-and-platforms/products/search/search-io-2026/">sweeping redesign</a> of the search box itself — the literal text field where billions of queries begin every day — transforming it from a simple keyword input into a dynamic, AI-driven conversation starter that can accept text, images, PDFs, videos, and even open Chrome tabs as inputs. The company is also merging its <a href="https://search.google/ways-to-search/ai-overviews/">AI Overviews</a> and <a href="https://search.google/ways-to-search/ai-mode/">AI Mode</a> features into a single, seamless search flow, eliminating the friction that previously forced users to choose between a traditional results page and an AI-forward experience.</p><p>Liz Reid, Google&#x27;s vice president and head of Search, called it &quot;the biggest upgrade to our iconic search box since its debut over 25 years ago&quot; during a press briefing on Monday.</p><p>The announcement arrived alongside a blizzard of other news — new <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/">Gemini models</a>, a personal <a href="https://blog.google/products-and-platforms/products/search/search-io-2026/">AI agent called Spark</a>, an intelligent <a href="https://blog.google/products-and-platforms/products/shopping/google-shopping-cart/">shopping cart</a>, a <a href="https://blog.google/innovation-and-ai/technology/developers-tools/google-io-2026-developer-highlights/">reimagined developer platform</a> — but the search box redesign may prove to be the most consequential. It is the clearest signal yet that Google views the future of its flagship product not as a place where users type fragmented keywords, but as an interface where they hold open-ended, multimodal conversations with an AI system backed by the entire web.</p><h2><b>The new search box expands, accepts files, and coaches you on what to ask</b></h2><p>The changes show a fundamental shift in how Google expects people to interact with the product that generates the vast majority of Alphabet&#x27;s revenue.</p><p>The box itself now dynamically expands to accommodate longer, more conversational queries. Where the old interface subtly encouraged brevity — a narrow field suited to two- or three-word keyword strings — the new design invites users to fully articulate complex questions in granular detail. It also now supports multimodal inputs directly. Users can upload images, PDFs, files, and videos, or drag in content from Chrome tabs, right from the main search interface. Previously, some of these capabilities existed in AI Mode, but reaching them required extra steps. Now they sit at the primary entry point.</p><p>Google is also deploying what it describes as an AI-powered query suggestion system that &quot;goes beyond autocomplete.&quot; Rather than simply predicting the next word a user might type based on popular searches, the system helps users formulate complex, nuanced queries — essentially coaching them toward the kind of detailed questions that AI Mode handles best.</p><p>The new search box is starting to roll out immediately in all countries and languages where AI Mode is available.</p><h2><b>Google is merging AI overviews and AI mode into one seamless experience</b></h2><p>Perhaps more significant than the box itself is the architectural change happening behind it. Google is unifying <a href="https://search.google/ways-to-search/ai-overviews/">AI Overviews</a> — the AI-generated summary panels that appear atop traditional search results — with <a href="https://search.google/ways-to-search/ai-mode/">AI Mode</a>, the more immersive conversational search experience the company launched at I/O one year ago.</p><p>Starting Tuesday, this merged experience will be live across mobile and desktop worldwide. A user can type a question, receive an AI Overview alongside traditional results, and then continue directly into a back-and-forth AI Mode conversation to ask follow-up questions — all without navigating to a separate interface.</p><p>Reid explained the logic during the press briefing: the new AI search box is &quot;an upgrade of our traditional search box, and so the results take you directly to main search rather than AI mode.&quot; She noted that while some power users actively sought out AI Mode, &quot;for most users, they don&#x27;t actually want to have to think about, do they want more of a traditional page or an AI-forward search experience.&quot;</p><p>The goal, she said, was to ensure that &quot;for most users, they don&#x27;t have to think about where to go, they can just go to the search box they&#x27;re familiar with, and it feels like they get the best experience afterwards.&quot;</p><h2><b>One billion users and doubling queries reveal how fast search behavior is shifting</b></h2><p>Google&#x27;s decision to redesign the foundational interface of its most important product did not happen in a vacuum. The company shared a set of usage statistics during the briefing that reveal just how rapidly user behavior is already changing.</p><p><a href="https://search.google/ways-to-search/ai-mode/">AI Mode</a>, which launched in the United States at I/O 2025, has surpassed one billion monthly users in its first year. AI Mode queries have been doubling every quarter since launch. AI Overviews, the lighter-weight AI summaries, now reach more than 2.5 billion monthly users. And overall <a href="https://www.theverge.com/tech/920815/google-alphabet-q1-2026-earnings-sundar-pichai">search query volume hit an all-time high</a> last quarter — a data point the company had previously disclosed on its earnings call.</p><p>Sundar Pichai, Google&#x27;s CEO, framed these figures as evidence that AI features are additive, not cannibalistic, to search usage. &quot;When people use our AI-powered features in search, they use search more,&quot; he said. He added that he loves &quot;how search has become less about individual queries and feels more like an ongoing conversation, giving users deeper insights and connecting you with the vastness of the web.&quot;</p><p>Reid reinforced the point: &quot;It&#x27;s not just that people are searching more, it&#x27;s that they&#x27;re searching differently. They&#x27;re fully expressing their questions in granular detail, asking those follow-up questions and searching across modalities.&quot;</p><h2><b>Gemini 3.5 Flash gives Google&#x27;s AI search the speed it needs to work at scale</b></h2><p>Under the hood, the new search experience runs on <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/">Gemini 3.5 Flash</a>, Google&#x27;s newest AI model, which the company also introduced at I/O. Google upgraded AI Mode&#x27;s underlying model to 3.5 Flash to deliver what Reid described as &quot;an even more powerful AI search experience.&quot;</p><p><a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/">Gemini 3.5 Flash</a> is the workhorse of this year&#x27;s announcements. Google claims it outperforms its previous frontier model, <a href="https://deepmind.google/models/gemini/pro/">Gemini 3.1 Pro</a>, on nearly all benchmarks while running four times faster in output tokens per second than comparable frontier models. Pichai described it as being &quot;in a league of its own in the top right quadrant&quot; of the <a href="https://artificialanalysis.ai/">Artificial Analysis index</a>, which plots intelligence against speed — meaning it delivers near-frontier quality at dramatically lower latency.</p><p>That speed matters enormously for search. A conversational AI search experience that feels sluggish would be dead on arrival for a product that serves billions of queries daily. By coupling the redesigned interface with a model optimized for both quality and throughput, Google is attempting to make AI-powered search feel as instantaneous as the old keyword experience — while being dramatically more capable.</p><h2><b>Search can now build interactive visuals and custom mini apps on the fly</b></h2><p>The redesigned search box is also the gateway to a set of new capabilities that push search far beyond text-based answers. Google announced what it calls &quot;<a href="https://blog.google/products-and-platforms/products/search/search-io-2026/">generative UI</a>&quot; — the ability for search to dynamically build custom widgets, interactive visualizations, and even mini applications in real time, tailored to a user&#x27;s specific question.</p><p>Reid offered a concrete example during the briefing: a user could ask &quot;How do black holes affect space time?&quot; and receive an interactive visual in an AI Overview that brings the concept to life. Follow-up questions would trigger the system to dynamically generate entirely new visuals in real time. This is possible, she explained, because of &quot;a novel real-time code generation system we built in partnership with the Google DeepMind team&quot; that runs on Gemini 3.5 Flash. Generative UI capabilities will roll out to everyone this summer, free of charge.</p><p>But Google is going further still. For ongoing tasks — planning a wedding, organizing a move, tracking a fitness routine — users will be able to build what the company describes as customizable, stateful experiences within search, powered by its <a href="https://blog.google/innovation-and-ai/technology/developers-tools/google-io-2026-developer-highlights/">Antigravity development platform</a>. These require no coding expertise. Users simply describe what they want in natural language, and search builds it. Those experiences will be available in coming months, starting with <a href="https://gemini.google/subscriptions/">Google AI Pro</a> and <a href="https://gemini.google/subscriptions/">Ultra</a> subscribers in the United States.</p><h2><b>AI agents that monitor the web around the clock are coming to search results</b></h2><p>The redesign also opens the door to what Google calls &quot;<a href="https://blog.google/products-and-platforms/products/search/search-io-2026/">information agents</a>&quot; — AI agents that users can configure directly within search to monitor the web 24/7 for specific conditions and deliver synthesized updates when those conditions are met.</p><p>A user could, for example, set up an agent to track market movements in a particular sector with specific parameters. The agent would create a monitoring plan, tap into real-time finance data, and proactively notify the user when conditions are met — complete with links and context for further research. Other use cases include apartment hunting, tracking sneaker drops, or monitoring any topic a user cares about. Information agents will launch first for <a href="https://gemini.google/subscriptions/">Google AI Pro</a> and <a href="https://gemini.google/subscriptions/">Ultra</a> subscribers this summer.</p><p>These agents sit within a much larger strategic pivot that Google articulated throughout the briefing: the company is going all-in on AI systems that don&#x27;t just answer questions but proactively take actions on users&#x27; behalf. Beyond search, Google introduced <a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">Gemini Spark</a>, a 24/7 personal AI agent that runs on dedicated virtual machines in Google Cloud. It unveiled the <a href="https://blog.google/products-and-platforms/products/shopping/google-shopping-cart">Universal Cart</a>, an intelligent cross-merchant shopping cart. It announced the <a href="https://cloud.google.com/blog/products/ai-machine-learning/announcing-agents-to-payments-ap2-protocol">Agent Payments Protocol</a> for agents to make secure purchases. And it expanded its <a href="https://blog.google/innovation-and-ai/technology/developers-tools/google-io-2026-developer-highlights/">Antigravity developer platform</a> into a full ecosystem for building autonomous AI agents.</p><h2><b>Publishers, advertisers, and SEO professionals face a new reality</b></h2><p>The redesign raises profound questions for the sprawling ecosystem — publishers, advertisers, SEO professionals — that has been built around the old model of keyword search and blue links.</p><p>If users increasingly express their needs as full, conversational sentences rather than fragmented keywords, the entire discipline of search engine optimization will need to evolve. Keyword-density strategies become less relevant when the AI is parsing natural language intent rather than matching strings. Content that answers deep, nuanced questions in authoritative ways becomes more valuable; content engineered to rank for two-word keyword fragments becomes less so.</p><p>For publishers, <a href="https://www.npr.org/2025/07/31/nx-s1-5484118/google-ai-overview-online-publishers">the stakes are existential</a>. AI Overviews already synthesize information from across the web and present it directly in search results, reducing the need for users to click through to source material. The new seamless AI Mode integration deepens that dynamic: users can now get an AI-generated answer and ask multiple follow-up questions without ever leaving the search page. Google has consistently maintained that its AI features drive more traffic to publishers, but the redesign puts that claim under renewed scrutiny as the search results page becomes more self-contained.</p><p>For advertisers — who fund the vast majority of Google&#x27;s revenue — the shift from keywords to conversations changes the calculus of ad targeting. Conversational queries contain richer intent signals, which could make ad targeting more precise and valuable. But they also create new ambiguities: when a user is in the middle of a multi-turn conversation with AI Mode, where does an ad naturally fit? Google did not detail changes to its advertising model during the briefing, but the structural shift in the interface will inevitably reshape how ads are surfaced and measured.</p><h2><b>The search box was always more than a product — it was a habit for billions of people</b></h2><p>There is a reason Google chose to redesign the search box rather than simply adding new features behind it. The search box is not just a product element at this point; it is a cultural artifact — one of the few pieces of digital infrastructure used by essentially the entire internet-connected world. Changing it sends an unmistakable message about where the company believes computing is headed.</p><p>For 25 years, the search box trained billions of people to think in keywords — to compress their curiosity into the shortest possible string of words. The new box invites them to do the opposite: to think out loud, to upload what they&#x27;re looking at, to ask follow-up questions, to let an AI system handle the compression.</p><p>Pichai tied the company&#x27;s broader ambitions to a striking statistic: Google&#x27;s surfaces now process over 3.2 quadrillion tokens per month, up seven-fold from a year ago. The company expects capital expenditures of approximately $180 to $190 billion in 2026 — roughly six times the $31 billion it spent four years ago — largely to support the infrastructure required for this AI transformation. When asked about the future of traditional search, he was direct. &quot;Search is the most used AI product in the world,&quot; he said.</p><p>The blinking cursor in Google&#x27;s search box still invites you to type. But after 25 years of teaching the world to speak in keywords, Google is now asking it to speak in sentences — and betting roughly $190 billion that it will.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Infrastructure</category>
            <category>Business</category>
            <category>AI</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/1TD0Sl7Zq6nnBSZMK9FXpl/41ce2cc6da055da7647670c71ba8aa6b/Nuneybits_Vector_art_of_an_oversized_white_search_bar_rimmed_in_695cac3f-1536-4438-acc1-51c16e2ff51f.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[Google’s new AI agent can draft your emails, monitor your inbox and eventually spend your money]]></title>
            <link>https://venturebeat.com/technology/googles-new-ai-agent-can-draft-your-emails-monitor-your-inbox-and-eventually-spend-your-money</link>
            <guid isPermaLink="false">4hgtjfb2JrRUNu1pS2w78a</guid>
            <pubDate>Tue, 19 May 2026 17:45:00 GMT</pubDate>
            <description><![CDATA[<p>Google on Tuesday unveiled <a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">Gemini Spark</a>, <!-- -->a personal AI agent designed to work around the clock — drafting emails, assembling documents, monitoring inboxes, and eventually making purchases — even when a user&#x27;s laptop is closed and their phone is locked.</p><p>The announcement, made at <a href="https://io.google/2026/">Google I/O 2026</a>, <!-- -->is the company&#x27;s most ambitious attempt yet to transform its AI assistant from a tool that answers questions into one that autonomously completes tasks. It also arrives at a moment of extraordinary competition, as Microsoft, OpenAI, Anthropic, and Apple all race to build AI systems that don&#x27;t merely converse but act — completing multi-step workflows with decreasing human supervision.</p><p>&quot;We are in that part of the cycle where people want to see real value in the products they use on a day-to-day basis,&quot; Sundar Pichai, CEO of Google and Alphabet, said during a press briefing ahead of the keynote address. With Spark, he argued, that value comes from an agent that never stops working. It operates around the clock in Google&#x27;s cloud, he said, so &quot;you don&#x27;t need to keep your laptop open to make sure it&#x27;s running.&quot;</p><p>The product arrives at an inflection point for the technology industry, as <a href="https://www.google.com/">Google</a>, <a href="https://microsoft.com/">Microsoft</a>, <a href="https://openai.com/">OpenAI</a>, <a href="https://www.anthropic.com/">Anthropic</a>, and <a href="https://www.apple.com/">Apple</a> all race to build AI systems that don&#x27;t merely converse but <i>do</i> — completing multi-step workflows with decreasing human supervision. It also raises urgent questions about trust, spending guardrails, and what happens when an artificial intelligence agent misinterprets a user&#x27;s intent.</p><p>Spark will begin rolling out this week to a small group of trusted testers, with a beta planned for <a href="https://gemini.google/subscriptions/">Google AI Ultra</a> subscribers in the United States next week.</p><h2><b>Inside the cloud architecture that lets Gemini Spark work while you sleep</b></h2><p>Unlike conventional AI assistants that activate only when prompted, <a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">Gemini Spark</a> is architecturally different. It runs persistently on Google Cloud infrastructure, powered by the company&#x27;s new <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/">Gemini 3.5 Flash model </a>and what Google calls the <a href="https://developers.googleblog.com/build-with-google-antigravity-our-new-agentic-development-platform/">Antigravity</a> agent harness — the same underlying system that powers the company&#x27;s internal developer tools.</p><p>In practical terms, this means Spark can accept a complex instruction — &quot;email my boss a status update pulling the latest figures from our shared spreadsheet and the project timeline in our Slides deck&quot; — and then execute it across multiple Google applications without further input. The agent can pull context from emails, documents, and calendar entries, synthesize the information, and produce a finished output.</p><p>Josh Woodward, VP of Google Labs, Gemini App, and AI Studio, described the experience in visceral terms during the briefing: &quot;When you use it, it almost feels like you&#x27;re tossing things over your shoulder — Spark&#x27;s catching them and gets the job done.&quot;</p><p>The cloud-based architecture is a deliberate design choice. Because <a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">Spark</a> operates on remote servers rather than on a user&#x27;s device, it can continue working through tasks after a user walks away. A student could ask Spark to build a study guide that updates itself as new assignments arrive from a professor. A small business owner could instruct it to monitor their inbox and flag potential customer inquiries. A parent could delegate the logistics of a neighborhood block party — tracking RSVPs, coordinating contributions, scouting venues. These are not hypothetical scenarios. Woodward said they reflect how early testers have actually been using the product.</p><p>Over the coming months, Google plans to expand Spark&#x27;s capabilities significantly. The company will roll out <a href="https://modelcontextprotocol.io/docs/getting-started/intro">MCP (Model Context Protocol)</a> connections to more than 30 third-party partners, including <a href="https://www.canva.com/">Canva</a>, <a href="https://www.opentable.com/">OpenTable</a>, and <a href="https://www.instacart.com/">Instacart</a>. Users will also be able to text and email <a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">Spark</a> directly, create custom sub-agents for specialized tasks, and connect Spark to Chrome for web-based actions. Later this year, a new Android interface called <a href="https://blog.google/products-and-platforms/platforms/android/android-halo/">Android Halo</a> will provide live, at-a-glance visibility into what Spark is working on, displayed at the top of a user&#x27;s phone screen.</p><h2><b>Google compares its AI spending safeguards to giving a teenager their first debit card</b></h2><p>For all its ambition, <a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">Spark</a> confronts a fundamental challenge that has bedeviled every AI agent to date: How do you trust an autonomous system to act on your behalf — particularly when money is involved?</p><p>Google is acutely aware of the concern. When asked during the press briefing how Spark would avoid making unauthorized purchases, Woodward reached for an analogy that was striking in its candor. &quot;On the team, we think a lot of it is like if you&#x27;re giving a teenager their first debit card — there&#x27;s sort of limits and sort of constraints around it, and that&#x27;s how we&#x27;ll be designing Spark as we go through the year,&quot; he said.</p><p>At launch, <a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">Spark</a> will not autonomously make purchases. Users will be given explicit opportunities to review and approve any transaction before it goes through. But Google has built the infrastructure for a more autonomous future. Vidhya Srinivasan, who leads Google&#x27;s ads and commerce teams, introduced the <a href="https://cloud.google.com/blog/products/ai-machine-learning/announcing-agents-to-payments-ap2-protocol">Agent Payments Protocol</a>, or AP2 — a system designed to let AI agents make secure purchases within user-defined boundaries.</p><p>The concept works like this: a user tells their agent the specific brands, products, and spending limits they&#x27;re comfortable with. If the criteria are met, the agent can automatically complete a purchase. AP2 creates what Google describes as a transparent, verifiable link between the user, the merchant, and payment processors, using privacy-preserving technology and tamper-proof digital mandates to ensure the agent is acting within its authorization. AP2 also generates a permanent digital paper trail, so that if a return is needed, the user and the merchant are looking at the same record. Google plans to bring AP2 to its products in the coming months, starting with Gemini Spark.</p><p>The system is underpinned by the <a href="https://ucp.dev/">Universal Commerce Protocol (UCP)</a>, an open-source standard Google announced earlier this year that gives agents and commerce systems a common language across the entire shopping journey. The <a href="https://github.com/Universal-Commerce-Protocol/ucp/discussions/379">UCP Tech Council</a> now includes <a href="https://www.amazon.com/">Amazon</a>, <a href="https://www.meta.ai/">Meta</a>, <a href="https://microsoft.com/">Microsoft</a>, <a href="https://www.salesforce.com/">Salesforce</a>, and <a href="https://stripe.com/">Stripe</a> — a remarkable coalition that underscores how seriously the industry takes the prospect of agent-driven commerce.</p><p>Google also announced the <a href="https://blog.google/products-and-platforms/products/shopping/google-shopping-cart/">Universal Cart</a>, an intelligent shopping cart that works across merchants and Google services. Users can add items while browsing Search, chatting with Gemini, watching YouTube, or reading Gmail. The cart then works in the background — tracking price drops, surfacing deals based on payment card perks, and even flagging product incompatibilities. The shopping infrastructure is rolling out in the U.S. this summer across Search and the Gemini app, with YouTube and Gmail to follow.</p><h2><b>How Google, OpenAI, Microsoft, Anthropic, and Apple are racing to build the definitive AI agent</b></h2><p>The announcement lands in the middle of the most intense competitive period in AI history. <a href="https://www.google.com/">Google</a>, <a href="https://microsoft.com/">Microsoft</a>, <a href="https://openai.com/">OpenAI</a>, <a href="https://www.anthropic.com/">Anthropic</a>, and <a href="https://www.apple.com/">Apple</a> are all racing to ship autonomous agents that can do real work — and each is placing a fundamentally different architectural bet on how to get there.</p><p>OpenAI recently unified its <a href="https://openai.com/index/introducing-operator/">Operator</a> and <a href="https://openai.com/index/introducing-deep-research/">deep research</a> capabilities into <a href="https://chatgpt.com/features/agent/">ChatGPT agent</a> — a system that brings together website interaction, information synthesis, and conversational intelligence. It carries out tasks using its own virtual computer, shifting between reasoning and action to handle complex workflows. The company emphasizes that users remain in control, with ChatGPT requesting permission before taking consequential actions. But the product has faced scrutiny over reliability. OpenAI&#x27;s Computer-Using Agent scores 38.1% on <a href="https://os-world.github.io/">OSWorld</a>, the industry benchmark for computer use tasks, while humans score over 72%.</p><p>Anthropic launched its <a href="https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use-tool">Claude Computer Use Agent</a> in research preview in March, giving Claude the ability to see, navigate, and control a user&#x27;s desktop — clicking buttons, opening applications, filling spreadsheets, and completing multi-step workflows. <a href="https://www.anthropic.com/product/claude-cowork">Claude Cowork</a> handles tasks autonomously — users give it a goal and Claude works on their computer, local files, and applications to return a finished deliverable. Anthropic has iterated aggressively, recently shipping ten pre-built financial agents and pursuing <a href="https://finance.yahoo.com/news/microsoft-and-anthropic-team-up-to-bring-claude-cowork-to-microsoft-365-130001836.html?guccounter=1&amp;guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&amp;guce_referrer_sig=AQAAAHf-_bBGe7n63ZmL9WvbJbVBUGvQZot0f8rsq87h8bUNAIiqZczpokQ3tX9ArNpd3yadf_i7gxP_a35_uPzOBFRnIFxWOzRGqPeocr5xUZNWqXDf5QeCOq4ADeZlPq3-k3ktkDu16dQ56dOTJTsRziQVGltoklvhgAlV0ig6jmM1">deep Microsoft 365 integration</a>.</p><p>Microsoft introduced <a href="https://www.microsoft.com/en-us/microsoft-365/blog/2026/03/09/copilot-cowork-a-new-way-of-getting-work-done/">Copilot Cowork</a> to move beyond chat and into execution — helping users delegate real tasks and have them completed. Cowork runs in the cloud, meaning users don&#x27;t have to worry about closing their laptop. The system is grounded in Work IQ, Microsoft&#x27;s intelligence layer that understands organizational data, tools, and structure. The shift moves Copilot from a sidebar helper to an orchestrator of autonomous agents.</p><p>Apple is also preparing a <a href="https://techcrunch.com/2026/05/17/apples-siri-revamp-could-include-auto-deleting-chats/">revamped Siri for WWDC 2026</a> that will act as an &quot;always-on agent&quot; capable of handling tasks across apps using personal data. Google&#x27;s Gemini models will help power the upgraded Siri through a multi-year deal reportedly costing Apple around $1 billion per year.</p><p>The convergence is unmistakable: every major platform is moving from assistants that talk to agents that act. But each is approaching the problem differently. OpenAI&#x27;s agent operates primarily through a browser. Anthropic&#x27;s works directly on a user&#x27;s desktop. Microsoft&#x27;s is tightly bound to the Office 365 ecosystem. Apple&#x27;s emphasizes on-device processing and privacy. Google&#x27;s approach with Spark is distinctive in its bet on cloud persistence and deep integration with its own services. </p><p>Rather than controlling a user&#x27;s screen pixel by pixel, Spark works through structured integrations — Google&#x27;s own <a href="https://developers.google.com/workspace/explore">Workspace APIs</a>, and increasingly, third-party connections through MCP. The advantage is reliability and speed: structured tool use is far more predictable than screen-reading. The disadvantage is that Spark, at least initially, can only act within the systems it&#x27;s been connected to.</p><h2><b>The AI model behind Spark processes trillions of tokens a day — and Google says it could save enterprises billions</b></h2><p>Spark&#x27;s capabilities are inseparable from the model that drives it. <a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">Gemini 3.5 Flash</a>, also announced Monday, is Google&#x27;s new workhorse AI model — designed specifically for the demands of agentic workflows.</p><p>The performance claims are important. Google says <a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">3.5 Flash</a> outperforms its previous frontier model, <a href="https://deepmind.google/models/gemini/pro/">Gemini 3.1 Pro</a>, across nearly all benchmarks, while running four times faster than comparable frontier models in terms of output tokens per second. An even more optimized version, available within Google&#x27;s <a href="https://developers.googleblog.com/build-with-google-antigravity-our-new-agentic-development-platform/">Antigravity</a> development platform, runs twelve times faster.</p><p>Pichai framed the economics bluntly. Companies processing roughly one trillion tokens per day on Google Cloud — a figure he said top enterprise customers are hitting — could save over $1 billion annually by shifting 80% of their workloads to a mix of Flash and frontier models like 3.5 Pro. In a market where, as Pichai noted, CIOs are already &quot;blowing through their annual token budgets and it&#x27;s only May,&quot; the cost argument may matter as much as the capability argument.</p><p>Internally, Google&#x27;s own developers have been consuming <a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">Gemini 3.5 Flash</a> at a staggering and rapidly accelerating pace. In March, Google was processing about half a trillion tokens per day internally. That figure has since grown to more than three trillion — doubling roughly every few weeks. Pichai described this as a &quot;powerful feedback loop&quot; that continually improves the model.</p><p>Koray Kavukcuoglu, CTO of Google DeepMind and Chief AI Architect for Google, said the model&#x27;s speed is what makes agentic use cases practical. &quot;3.5 Flash is especially good when deploying multiple agents simultaneously and completing long-running tasks,&quot; he said during the briefing, adding that Google had successfully tested agents building &quot;a working operating system entirely from scratch.&quot;</p><p>The 3.5 Pro model, the more powerful sibling, is currently being tested internally and will roll out next month.</p><h2><b>What Gemini Spark costs and where it fits in Google&#x27;s new subscription tiers</b></h2><p><a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">Gemini Spark</a> will be available to <a href="https://gemini.google/subscriptions/">Google AI Ultra</a> subscribers. The company is simultaneously restructuring its subscription tiers to make the technology more accessible. A new Ultra plan at $100 per month provides a 5x higher usage limit than the Pro plan, along with priority access to Antigravity and 20TB of cloud storage. The top-tier Ultra plan drops from $250 to $200 per month, with a 20x higher usage limit and access to the full suite of capabilities.</p><p>Both tiers include Gemini Spark, the Daily Brief agent — a proactive morning digest that triages email, calendar, and tasks overnight — and access to the new Gemini Omni and 3.5 Flash models. The pricing positions Spark as a premium product — more expensive than Anthropic&#x27;s Claude Pro at $20 per month, but comparable to the higher tiers of competing products like <a href="https://support.claude.com/en/articles/11049741-what-is-the-max-plan">Claude Max</a> ($100–$200/month) and OpenAI&#x27;s <a href="https://chatgpt.com/plans/pro/">ChatGPT Pro</a> ($200/month).</p><h2><b>Why privacy, reliability, and ecosystem lock-in could undermine Google&#x27;s agent ambitions</b></h2><p>The risks are real and multidimensional.</p><p>Reliability remains the industry&#x27;s greatest challenge. Even the best AI models hallucinate, misinterpret instructions, and make errors that a human would never make. An agent that drafts an email to the wrong person, misreads a spreadsheet figure, or sends a payment to the wrong merchant could create consequences that are difficult to reverse. Google&#x27;s approach of requiring explicit approval for high-stakes actions like spending money or sending emails is a sensible safeguard — but it also limits how autonomous the agent can actually be. An agent that asks for confirmation at every turn isn&#x27;t much of an agent at all.</p><p>Privacy is another concern. Spark&#x27;s ability to synthesize information across a user&#x27;s entire Gmail inbox, calendar, documents, and chat history means it has an extraordinarily deep view of a person&#x27;s digital life. Google says Spark operates on a fully managed, secure runtime with isolated ephemeral virtual machines, encrypted credentials, and Data Loss Prevention policies. But the concentration of personal context in a single AI system — accessible through natural language — creates a surface area that will attract scrutiny from regulators, privacy advocates, and security researchers.</p><p>Market timing is uncertain, too. The consumer appetite for always-on AI agents is unproven at scale. Google says the Gemini app has 900 million monthly users, but it&#x27;s unclear how many of those users are ready for the conceptual leap from &quot;ask a question, get an answer&quot; to &quot;delegate a task, trust the outcome.&quot; The history of digital assistants — from Clippy to early Siri to Alexa — is littered with products that promised proactive intelligence and delivered frustration.</p><p>And then there is the question of ecosystem lock-in. Spark works best within Google&#x27;s own services. While MCP connections to third-party apps will broaden its reach, the initial experience is one of deep Workspace integration. For the billions of people who live inside Google&#x27;s ecosystem, this is a natural fit. For those who split their digital lives across Microsoft, Apple, and other platforms, Spark&#x27;s utility will be more limited — at least initially.</p><p>Woodward acknowledged as much when asked whether Spark would remain confined to the Google ecosystem. &quot;It&#x27;s going to be cross-platform in two ways,&quot; he said — through MCP integrations with third-party apps, and through availability on the web, Android, and iOS, with tasks syncing across devices via the cloud.</p><h2><b>The real test for Gemini Spark isn&#x27;t whether it can do the work — it&#x27;s whether people will let it</b></h2><p>Google&#x27;s bet with <a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">Gemini Spark</a> is that the AI industry&#x27;s center of gravity is shifting from models that think to systems that act — and that the company best positioned to win that transition is the one with the most comprehensive set of consumer services to act within. It is a bet backed by enormous infrastructure investment. Google expects to spend approximately $180 to $190 billion in capital expenditure this year — roughly six times what it spent in 2022 — much of it on the AI compute required to run agents like Spark at scale for hundreds of millions of users.</p><p>The technology, in other words, is arriving. The models are fast enough, the integrations deep enough, the payment rails secure enough. Google has built a system that can draft your emails, organize your calendar, monitor your inbox, and soon enough, spend your money — all while you sleep.</p><p>But the hardest problem in artificial intelligence has never been making a machine capable. It has been making a human comfortable. For two decades, Google&#x27;s core promise has been ten blue links and a search box — a transaction built on the assumption that the user is in control. Gemini Spark asks users to renegotiate that relationship entirely, to hand a set of keys to a system that is brilliant, tireless, and still, by its maker&#x27;s own admission, best compared to a teenager with a debit card.</p><p>Gemini Spark rolls out to trusted testers this week, with a broader beta for U.S. Google AI Ultra subscribers expected next week.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Business</category>
            <category>Data</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/2HYrJZXbNuYmyD8DBAQM6A/1429ed7baf4f012fac902220470dc383/Nuneybits_Vector_art_of_blinking_cursor_morphing_into_multicolo_0bc0a921-6033-4931-8bb8-73bf5877e798.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[Google says Gemini 3.5 Flash can slash enterprise AI costs by more than $1 billion a year]]></title>
            <link>https://venturebeat.com/technology/google-says-gemini-3-5-flash-can-slash-enterprise-ai-costs-by-more-than-1-billion-a-year</link>
            <guid isPermaLink="false">5fwj3QvfBZREeeVc9ESRBZ</guid>
            <pubDate>Tue, 19 May 2026 17:45:00 GMT</pubDate>
            <description><![CDATA[<p>Google unveiled <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/">Gemini 3.5 Flash</a> at its annual <a href="https://io.google/2026/">I/O developer conference</a> on Tuesday, a new artificial intelligence model that the company says shatters what had become a seemingly iron law of the AI industry: that the smartest models must also be the slowest and most expensive to run.</p><p>The model sits at the center of a sweeping set of announcements — from a video-generating &quot;world model&quot; called <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/">Gemini Omni</a> to a 24/7 personal AI agent called <a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">Gemini Spark</a> — but 3.5 Flash carries perhaps the most immediate consequence for the enterprises pouring billions of dollars into AI infrastructure. Sundar Pichai, Google&#x27;s chief executive, told reporters during a press briefing Monday that companies running roughly one trillion tokens per day on Google Cloud could save more than $1 billion annually by shifting 80 percent of their workloads to a mix of Flash and other frontier models.</p><p>&quot;You&#x27;ve probably heard anecdotes from other CIOs that companies are already blowing through their annual token budgets, and it&#x27;s only May,&quot; Pichai said, framing the model not just as a technical achievement but as a financial lifeline for organizations struggling with the runaway costs of deploying AI at scale.</p><p>The claim, if it holds, would be one of the most significant shifts in the economics of enterprise AI since large language models entered corporate computing.</p><h2>Why enterprises have been forced to choose between AI quality and AI speed</h2><p>For the past three years, organizations adopting generative AI have faced a painful trade-off. The most capable models — the ones that can reason through complex multistep problems, write reliable code, and parse dense financial documents — tend to be large, slow, and expensive to query. Faster, cheaper models sacrifice accuracy. Chief information officers have been forced into a kind of AI portfolio management: routing simple queries to lightweight models and reserving the heavy-duty reasoning engines for high-stakes tasks. It is a complex, brittle system that adds engineering overhead and often delivers inconsistent user experiences.</p><p><a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/">Gemini 3.5 Flash</a> attacks that trade-off directly. According to Google&#x27;s internal benchmarks and a third-party analysis from <a href="https://artificialanalysis.ai/">Artificial Analysis</a>, the model outperforms Google&#x27;s own <a href="https://deepmind.google/models/gemini/pro/">Gemini 3.1 Pro</a> — a model the company positioned as its top-tier flagship just four to five months ago — on nearly every major benchmark. It scores 76.2 percent on <a href="https://www.tbench.ai/">Terminal-Bench 2.1</a>, reaches 1656 Elo on <a href="https://artificialanalysis.ai/evaluations/gdpval-aa">GDPval-AA</a>, hits 83.6 percent on <a href="https://labs.scale.com/leaderboard/mcp_atlas">MCP Atlas</a>, and leads in multimodal understanding with 84.2 percent on <a href="https://charxiv.github.io/">CharXiv Reasoning</a>.</p><p>Yet it does all of this while generating output tokens at four times the speed of comparable frontier models from competitors. Koray Kavukcuoglu, chief technology officer of Google DeepMind and chief AI architect for Google, told reporters the team has pushed even further: &quot;We have developed an even more optimized version of Flash, not just four times, but actually 12 times faster with the same quality.&quot; That turbo variant is available starting Tuesday inside <a href="https://developers.googleblog.com/build-with-google-antigravity-our-new-agentic-development-platform/">Antigravity</a>, Google&#x27;s agentic development platform.</p><p>Pichai put the performance gap in blunt terms: &quot;3.5 Flash is better than 3.1 Pro, which was just four months ago, and it&#x27;s at the almost, I would say, 90% of the performance of frontier models, 4x faster, much faster in Antigravity, maybe 12x, and about 1/3 to one half the cost.&quot;</p><p>Landing in what Artificial Analysis calls the &quot;top-right quadrant&quot; of its intelligence-versus-speed index — the only model to do so — Flash occupies a position no competitor currently holds.</p><h2>The trillion-token math behind Google&#x27;s $1 billion savings claim</h2><p>To understand why Flash matters so much to enterprise buyers, you need to understand the economics of tokens — the fundamental units of data that AI models process. Every query a customer service chatbot answers, every legal document an AI summarizes, every line of code an agent writes, consumes tokens. And at frontier-model pricing, those tokens add up fast.</p><p>Google says its model APIs now process around 19 billion tokens per minute. Across all of Google&#x27;s own surfaces — Search, the Gemini app, Workspace, and more — the company processes over 3.2 quadrillion tokens per month, a figure that has jumped seven-fold in the past year alone. Two years ago, at I/O 2024, the number was 9.7 trillion per month.</p><p>The explosion in token consumption is not unique to Google. Enterprises across industries are discovering that the more capable their AI deployments become, the more tokens they burn. Agentic workflows — where AI systems autonomously execute multistep tasks, call tools, write and run code, and iterate on their own output — are particularly token-hungry. A single agentic coding session can consume orders of magnitude more tokens than a simple question-and-answer exchange.</p><p>This is where Flash&#x27;s cost advantage becomes transformative. The model delivers what Google describes as frontier-level capabilities at less than half the price, in some cases almost a third the price, of comparable frontier models. For a hypothetical enterprise processing one trillion tokens per day on Google Cloud — a scale Pichai said top customers are already reaching — the savings from shifting 80 percent of workloads to a Flash-and-frontier blend would exceed $1 billion per year.</p><p>That is not a rounding error. It is the kind of number that reshapes procurement decisions, accelerates deployment timelines, and fundamentally alters the return-on-investment calculus for AI initiatives that many boards of directors have been scrutinizing with increasing impatience.</p><h2>How Google&#x27;s own engineers created a data flywheel that rivals cannot easily copy</h2><p>Perhaps the most strategically significant detail Google shared Tuesday was not a benchmark score or a price point. It was a chart showing the company&#x27;s own internal token consumption on <a href="https://blog.google/innovation-and-ai/technology/developers-tools/google-io-2026-developer-highlights/">Antigravity 2.0</a>, its reimagined agentic development platform.</p><p>In March 2026, Google&#x27;s developers were processing roughly half a trillion tokens per day inside <a href="https://developers.googleblog.com/build-with-google-antigravity-our-new-agentic-development-platform/">Antigravity</a>. By the time of the I/O press briefing in mid-May, that figure had surged past three trillion — a six-fold increase in approximately ten weeks, with usage doubling &quot;literally every few weeks,&quot; according to Pichai.</p><p>This internal usage creates what AI researchers call a data flywheel: the more Google&#x27;s own engineers use 3.5 Flash to build products, the more real-world signal the model team collects on where the model excels and where it stumbles. That signal feeds back into model improvement, which makes the model more useful, which drives more usage, which generates more signal. It is a virtuous cycle — and it is one that competing AI labs, which rely primarily on external developer usage and synthetic benchmarks, cannot easily replicate at the same speed or fidelity.</p><p>&quot;That scale creates a powerful feedback loop, and that is what has allowed us to keep improving the 3.5 series of models,&quot; Pichai said.</p><p>When pressed during the Q&amp;A about the competitive frontier — particularly in light of recent advances from rival labs — Pichai acknowledged the landscape is &quot;very dynamic&quot; and &quot;moving fast&quot; but expressed confidence in Google&#x27;s breadth. He added that the company&#x27;s focus with the 3.5 series has been on &quot;taking the model intelligence, making sure tool use, instruction following, long horizon use cases, agent decoding all work well.&quot;</p><p>Kavukcuoglu reinforced the agentic emphasis, noting that 3.5 Flash &quot;can now handle multi-hour autonomous sessions&quot; and &quot;can independently execute complex coding pipelines or manage iterative research projects entirely by itself.&quot; The team, he said, even tested the model by having agents build a working operating system entirely from scratch.</p><h2>Antigravity 2.0 transforms Google&#x27;s code editor into an agent command center</h2><p>The arrival of 3.5 Flash is tightly coupled with the launch of <a href="https://blog.google/innovation-and-ai/technology/developers-tools/google-io-2026-developer-highlights/">Antigravity 2.0</a>, a significant expansion of the agentic development platform Google first introduced six months ago. What began as a coding environment has evolved into what Google describes as a full platform for developing and managing teams of autonomous AI agents, and the company says millions of developers are already building with it.</p><p><a href="https://blog.google/innovation-and-ai/technology/developers-tools/google-io-2026-developer-highlights/">Antigravity 2.0</a> ships as a new standalone desktop application that serves as a central hub for orchestrating multiple agents simultaneously. Google offered the example of running one agent to code a website, a second to generate brand assets, and a third to plan product architecture — all in parallel, all managed from a single interface. For developers who prefer command-line workflows, there is Antigravity CLI. And for those building programmatic integrations, the new Antigravity SDK provides direct access to the same agent harness powering Google&#x27;s own first-party products.</p><p>The co-development of 3.5 Flash and <a href="https://blog.google/innovation-and-ai/technology/developers-tools/google-io-2026-developer-highlights/">Antigravity 2.0</a> is no accident. &quot;We have co-developed <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/">3.5 Flash</a> together with Google Antigravity, our agentic development platform,&quot; Kavukcuoglu said. This tight integration means Flash&#x27;s strengths — speed, tool use, long-context reasoning, and code generation — are specifically tuned for the kinds of workloads developers execute inside the platform.</p><p>Google is also launching <a href="https://blog.google/innovation-and-ai/technology/developers-tools/managed-agents-gemini-api/">Managed Agents</a> in the <a href="https://ai.google.dev/gemini-api/docs">Gemini API</a>, allowing developers to spin up an agent with a single API call that reasons, uses tools, and executes code in an isolated Linux environment. And it introduced CodeMender, an AI security agent that uses Gemini&#x27;s advanced reasoning to automatically find and fix critical code vulnerabilities — a capability Kavukcuoglu described as essential as agentic systems write an increasing share of the world&#x27;s code.</p><h2>Google&#x27;s $190 billion infrastructure bet and the custom silicon powering cheaper AI</h2><p>The models and platforms sit atop a staggering infrastructure investment that Pichai revealed during the briefing: Google expects capital expenditures of approximately $180 billion to $190 billion in 2026 — roughly six times the $31 billion the company spent in 2022, just four years ago.</p><p>A key component of that spending is custom silicon. The company recently unveiled its eighth generation of <a href="https://cloud.google.com/tpu">Tensor Processing Units</a>, adopting for the first time a dual-chip architecture with specialized designs for training (TPU 8o) and inference (TPU 8i). Google says it can now distribute model training across multiple data center sites using a system called Pathways, scaling beyond one million TPUs globally — a setup the company claims constitutes the largest training cluster in the world.</p><p>&quot;This means training larger, more capable models in weeks, rather than months,&quot; Pichai said. The infrastructure advantage matters enormously for Flash&#x27;s economics. Custom silicon optimized for inference means Google can run Flash at lower cost per token than competitors relying on general-purpose GPUs, and the savings get passed along — at least partially — to customers.</p><p>The capex figure also signals something strategic about Google&#x27;s long-term posture. While some investors have grown nervous about the astronomical sums cloud providers are spending on AI infrastructure, Google is framing the spending as a competitive moat. The more infrastructure it builds, the cheaper it can run inference, the more attractive its models become, and the more usage it captures to improve the next generation. It is the flywheel logic again, extended from software all the way down to silicon.</p><h2>Gemini Omni, Spark, and the consumer products Flash now powers at massive scale</h2><p>While the enterprise cost story dominates the Flash narrative, Google also made sweeping moves on the consumer side that put the model to work across products reaching billions of people. Flash is now the default model powering the <a href="https://gemini.google.com/app">Gemini app</a> — which has surpassed 900 million monthly active users, more than doubling from 400 million a year ago — and AI Mode in Google Search, which has crossed one billion monthly users in its first year.</p><p>Google introduced <a href="https://blog.google/innovation-and-ai/products/gemini-app/next-evolution-gemini-app/">Gemini Spark</a>, a 24/7 personal AI agent that runs on dedicated virtual machines in Google Cloud and operates in the background even when a user&#x27;s device is off. Powered by 3.5 Flash with the full Antigravity harness, Spark integrates with Gmail, Docs, Sheets, and Slides. Josh Woodward, who leads Google Labs and the Gemini app, described the experience vividly: &quot;When you use it, it almost feels like you&#x27;re tossing things over your shoulder, Spark&#x27;s catching them and gets the job done.&quot; On the safety front, Spark requires explicit user approval before high-stakes actions. Google also announced the Agent Payments Protocol, which lets users set strict guardrails — approved brands, spending caps, specific merchants — before an agent can spend money on their behalf. Woodward compared the design to &quot;giving a teenager their first debit card — there&#x27;s sort of limits and sort of constraints around it.&quot;</p><p>Alongside Flash, Google unveiled <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/">Gemini Omni</a>, a model capable of generating any output from any input, starting with video. Kavukcuoglu drew a sharp distinction from Google&#x27;s existing Veo model: &quot;Veo is a text-to-video model. Omni is a true and true multi-model input, multi-model output model.&quot; All Omni-generated content carries <a href="https://deepmind.google/models/synthid/">Google&#x27;s SynthID watermark</a>, and the company announced that OpenAI, Kakao, and ElevenLabs are adopting SynthID as well.</p><p>The company also reimagined its search box for the first time in over 25 years, introduced information agents that monitor the web around the clock for user-defined conditions, and launched the <a href="https://blog.google/products-and-platforms/products/shopping/google-shopping-cart/">Universal Cart</a> — an AI-powered cross-merchant shopping cart built on Google Wallet. Liz Reid, who leads Google Search, called the new search box &quot;the biggest upgrade to our iconic search box since its debut.&quot;</p><h2>What Google&#x27;s six-month model cadence means for the enterprise AI cost curve</h2><p>Google signaled that <a href="https://blog.google/innovation-and-ai/technology/developers-tools/google-io-2026-developer-highlights">3.5 Flash</a> is just the opening act of the 3.5 series. Gemini 3.5 Pro is currently in internal testing and will roll out to everyone next month. Kavukcuoglu indicated the company has been operating on roughly a six-month cadence for major model updates — Gemini 3 in November, 3.5 in May — and expects that rhythm to continue.</p><p>When a reporter from The New York Times asked how Google determines whether a release warrants a full numerical jump or a half-step increment, Kavukcuoglu said the numbering reflects the magnitude of research progress: &quot;What defines the numbering update is really the progress that we see in our research and how it is reflected in the models and the impact that they have.&quot;</p><p>For enterprise buyers, that cadence carries an important implication: the cost-performance curve is not just improving — it is improving on a predictable schedule. A model that outperforms the previous flagship at a third the cost every six months fundamentally changes the planning horizon for AI investments. It means the token budgets that companies are blowing through today may look quaint by the end of the year.</p><p>Google&#x27;s announcements arrive at a moment of intense competition. <a href="https://openai.com/">OpenAI</a>, <a href="https://www.anthropic.com/">Anthropic</a>, <a href="https://www.meta.ai/">Meta</a>, and a constellation of smaller labs are all racing to deliver models that balance capability with cost. Microsoft has been aggressively integrating OpenAI&#x27;s models into Azure and Copilot. But Google benefits from a structural advantage that is easy to overlook: distribution. With 13 products serving more than a billion users each — five of which exceed three billion — Google can deploy Flash to an audience no pure-play AI lab can match. Every improvement immediately benefits Search, Gmail, Docs, Maps, and YouTube. And the usage data flowing back from those billions of interactions feeds the very flywheel that makes the next model better.</p><p>The question now is whether the $1 billion savings figure — an eye-catching projection based on a specific workload mix — will survive contact with the messy reality of corporate AI deployments, where legacy systems, compliance requirements, and organizational inertia have a way of blunting even the most compelling cost curves. But if Google&#x27;s own internal usage is any guide — three trillion tokens a day and climbing, doubling every few weeks, with no sign of slowing — the company is not just selling the bet. It is making the bet itself, with its own engineers, on its own infrastructure, at a scale no customer has yet attempted. In the AI cost wars, the most persuasive pitch may simply be: we did it first.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Infrastructure</category>
            <category>Business</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/7fKaf2uuvjEF3Xx4BoMfGO/18db678b063c2016fd103fb10b401e6e/Nuneybits_Vector_art_of_a_retro_desktop_computer_with_a_CRT_mon_d9002d4b-28a6-41f8-adae-f920ec287e8a.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[AI IQ is here: a new site scores frontier AI models on the human IQ scale. The results are already dividing tech.]]></title>
            <link>https://venturebeat.com/technology/ai-iq-is-here-a-new-site-scores-frontier-ai-models-on-the-human-iq-scale-the-results-are-already-dividing-tech</link>
            <guid isPermaLink="false">in4gU02Tb6pVjaQ18BxoA</guid>
            <pubDate>Wed, 13 May 2026 23:47:24 GMT</pubDate>
            <description><![CDATA[<p>For decades, the IQ test has been one of the most familiar — and most contested — yardsticks for human intelligence. Now, a startup project called <a href="https://www.aiiq.org/">AI IQ</a> is applying the same metaphor to artificial intelligence, assigning estimated intelligence quotients to more than 50 of the world&#x27;s most powerful language models and plotting them on a standard bell curve.</p><p>The result is a set of interactive visualizations at<a href="https://www.aiiq.org/"> aiiq.org</a> that have ricocheted across social media in the past week, drawing praise from enterprise technologists who say the charts make an impossibly complex market legible — and sharp criticism from researchers and commentators who warn the entire framework is misleading.</p><div></div><p>&quot;This is super useful,&quot; wrote <a href="https://x.com/ThibautMelen/status/2054559193793708216">Thibaut Mélen</a>, a technology commentator, on X. &quot;Much easier to understand model progress when it&#x27;s mapped like this instead of another giant leaderboard table.&quot;</p><p>Brian Vellmure, a business strategist, offered a similar endorsement: &quot;This is helpful. Anecdotally tracks with personal experience.&quot;</p><p>But the backlash arrived just as quickly. &quot;It&#x27;s nonsense. AI is far too jagged. The map is not the territory,&quot; posted <a href="https://x.com/AiDeeply/status/2054388867105460258">AI Deeply</a>, an artificial intelligence commentary account, crystallizing a worry shared by many researchers: that reducing a language model&#x27;s sprawling, uneven capabilities to a single number creates a dangerous illusion of precision.</p><h2><b>Twelve benchmarks, four dimensions, and one controversial number: how AI IQ actually works</b></h2><p>AI IQ was created by <a href="https://www.shea.io/">Ryan Shea</a>, an engineer, entrepreneur, and angel investor best known as a co-founder of the blockchain platform <a href="https://www.stacks.co/">Stacks</a>. Shea also co-founded <a href="https://voterbase.org/">Voterbase</a> and has invested in the early stages of several unicorns, including <a href="https://opensea.io/">OpenSea</a>, <a href="https://lattice.com/">Lattice</a>, <a href="https://www.anchorage.com/">Anchorage</a>, and <a href="https://mercury.com/">Mercury</a>. He holds a Bachelor of Science in Mechanical Engineering from Princeton University.</p><p>The site&#x27;s methodology rests on a deceptively simple formula. <a href="https://www.aiiq.org/">AI IQ</a> groups 12 benchmarks into four reasoning dimensions: abstract, mathematical, programmatic, and academic. The composite IQ is a straight average of those four dimension scores: IQ = ¼ (IQ_Abstract + IQ_Math + IQ_Prog + IQ_Acad).</p><p>The abstract reasoning dimension draws from <a href="https://arcprize.org/arc-agi/1">ARC-AGI-1</a> and <a href="https://arcprize.org/arc-agi/2">ARC-AGI-2</a>, the notoriously difficult pattern-recognition benchmarks designed to test general fluid intelligence. Mathematical reasoning includes <a href="https://epoch.ai/frontiermath">FrontierMath</a> (Tiers 1–3 and Tier 4), <a href="https://www.vals.ai/benchmarks/aime">AIME</a>, and <a href="https://www.vals.ai/benchmarks/proof_bench">ProofBench</a>. Programmatic reasoning uses <a href="https://www.tbench.ai/">Terminal-Bench 2.0</a>, <a href="https://www.swebench.com/verified.html">SWE-Bench Verified</a>, and <a href="https://scicode-bench.github.io/">SciCode</a>. Academic reasoning pulls from <a href="https://agi.safe.ai/">Humanity&#x27;s Last Exam</a>, <a href="https://critpt.com/">CritPt</a>, and <a href="https://epoch.ai/benchmarks/gpqa-diamond?view=graph&amp;tab=release-date">GPQA Diamond</a>.</p><p>Each raw benchmark score gets mapped to an implied IQ through what the site describes as &quot;hand-calibrated difficulty curves.&quot; Crucially, the methodology compresses ceilings for benchmarks considered easier or more susceptible to data contamination, preventing them from inflating scores above 100. Harder, less gameable benchmarks retain higher ceilings. The system also handles missing data conservatively: models need scores on at least two of the four dimensions to receive a derived IQ, and when benchmarks are absent, the pipeline deliberately pulls scores down rather than up. The site states that &quot;every derived IQ averages all four dimensions, so missing coverage cannot make a model look better by omission.&quot;</p><h2><b>OpenAI leads the bell curve, but the gap between the top AI models has never been smaller</b></h2><p>As of mid-May 2026, the <a href="https://www.aiiq.org/">AI IQ</a> charts tell a story of rapid convergence at the top of the frontier — and widening diversity in the tiers below.</p><p>According to the Frontier IQ Over Time chart, <a href="https://openai.com/index/introducing-gpt-5-5/">GPT-5.5</a> from OpenAI currently sits at the peak of the bell curve, with an estimated IQ near 136 — the highest of any model tracked. It is closely followed by <a href="https://openai.com/index/introducing-gpt-5-4/">GPT-5.4</a> (approximately 131), <a href="https://www.anthropic.com/news/claude-opus-4-7">Opus 4.7</a> from Anthropic (approximately 132), and <a href="https://www.anthropic.com/news/claude-opus-4-6">Opus 4.6</a> (approximately 129). Google&#x27;s <a href="https://deepmind.google/models/gemini/pro/">Gemini 3.1 Pro</a> lands near 131, making the top cluster extraordinarily tight.</p><p>That compression is not unique to AI IQ&#x27;s framework. <a href="https://www.visualcapitalist.com/">Visual Capitalist</a>, drawing from a separate Mensa-based ranking by TrackingAI, recently observed the same dynamic, noting that &quot;the biggest takeaway is how compressed the top of the leaderboard has become.&quot; On that scale, Grok-4.20 Expert Mode and GPT 5.4 Pro tied at 145, with Gemini 3.1 Pro at 141.</p><p>Below the frontier cluster, the AI IQ charts show a crowded midfield. Models from Chinese labs — <a href="https://www.kimi.com/ai-models/kimi-k2-6">Kimi K2.6</a>, <a href="https://z.ai/blog/glm-5">GLM-5</a>, <a href="https://api-docs.deepseek.com/news/news251201">DeepSeek-V3.2</a>, <a href="https://huggingface.co/Qwen/Qwen3.6-27B">Qwen3.6</a>, <a href="https://huggingface.co/MiniMaxAI/MiniMax-M2.7">MiniMax-M2.7</a> — bunch between roughly 112 and 118, making the cost-performance tier increasingly competitive for enterprise buyers who don&#x27;t need the absolute best model for every task. One X user, ovsky, noted that the data &quot;confirms experience with sonnet 4.6 being an absolute workhorse as opposed to opus 4.5&quot; — pointing to the way the charts can validate practitioner intuitions that headline rankings often miss.</p><h2><b>Why emotional intelligence scores are becoming the new battleground in AI model rankings</b></h2><p>What distinguishes <a href="https://www.aiiq.org/">AI IQ</a> from most other benchmarking efforts is its inclusion of an &quot;EQ&quot; — emotional intelligence — score. The site maps each model&#x27;s EQ-Bench 3 Elo score and Arena Elo score to an estimated EQ using calibrated piecewise-linear scales, then takes a 50/50 weighted composite of the two.</p><p>The EQ scores produce a meaningfully different ranking than IQ alone. On the IQ vs. EQ scatter plot, Anthropic&#x27;s <a href="https://www.anthropic.com/news/claude-opus-4-7">Opus 4.7</a> leads on EQ with a score near 132, pushing it into the upper-right quadrant — the most desirable position, signaling both high cognitive and high emotional intelligence. OpenAI&#x27;s <a href="https://openai.com/index/introducing-gpt-5-5/">GPT-5.5</a> and <a href="https://openai.com/index/introducing-gpt-5-4/">GPT-5.4</a> cluster in the high-IQ zone but lag slightly on EQ. Google&#x27;s Gemini 3.1 Pro sits in a strong middle position on both axes.</p><p>One notable methodological choice has drawn attention: <a href="https://eqbench.com/">EQ-Bench 3</a> is judged by Claude, an Anthropic model, which the site acknowledges &quot;creates potential scoring bias in favor of Anthropic models.&quot; To correct for this, AI IQ subtracts a 200-point Elo penalty from the EQ-Bench component for all Anthropic models before mapping to implied EQ. The Arena component is unaffected since it uses human judges. That self-correction is unusual in the benchmarking world, and it suggests Shea is aware of the methodological minefield he has entered. Still, the EQ dimension captures something IQ alone cannot: the growing importance of conversational quality, collaboration, and trust in models deployed for user-facing work.</p><h2><b>The AI cost-performance chart that enterprise buyers actually need to see</b></h2><p>Perhaps the most practically useful chart on the site is not the bell curve but the <a href="https://www.aiiq.org/costs/">IQ vs. Effective Cost</a> scatter plot. It maps each model&#x27;s estimated IQ against an &quot;effective cost&quot; metric — defined as the token cost for a task using 2 million input tokens and 1 million output tokens, multiplied by a usage efficiency factor.</p><p>The chart reveals a familiar pattern in enterprise technology: the best models are not always the best value. GPT-5.5 and Opus 4.7 sit in the upper-left corner — high IQ, high cost, with effective per-task costs north of $30 and $50 respectively. Meanwhile, models like <a href="https://openai.com/index/introducing-gpt-5-4-mini-and-nano/">GPT-5.4-mini</a>, <a href="https://api-docs.deepseek.com/news/news251201">DeepSeek-V3.2</a>, and <a href="https://huggingface.co/MiniMaxAI/MiniMax-M2.7">MiniMax-M2.7</a> occupy a sweet spot in the middle: respectable IQ scores between 112 and 120, at effective costs ranging from roughly $1 to $5 per task. At the cheapest extreme, <a href="https://openai.com/index/introducing-gpt-oss/">GPT-oss-20b</a> (an open-source OpenAI model) appears near $0.20 effective cost with an IQ around 107 — potentially the most economical option for bulk classification or extraction workloads.</p><p>The site also offers a 3D visualization mapping IQ, EQ, and effective cost simultaneously. A dashed line running through the cube points toward the ideal: higher IQ, higher EQ, and lower cost. Models near the &quot;green end&quot; of that axis are stronger all-around deals; those near the &quot;red end&quot; sacrifice capability, cost efficiency, or both. For CIOs staring at API invoices, the implication is clear: the intelligence gap between a $50 model and a $3 model has narrowed enough that routing — using expensive models for hard problems and cheap ones for everything else — is no longer optional. It is the dominant architecture for serious AI deployments.</p><h2><b>Critics say AI&#x27;s &quot;jagged&quot; capabilities make a single IQ score dangerously misleading</b></h2><p>The loudest objection to AI IQ is philosophical, and it cuts deep. Critics argue that collapsing a model&#x27;s uneven capabilities into a single score obscures more than it reveals. </p><p>&quot;IQ as a proxy is fading — we&#x27;re seeing reasoning density spikes that don&#x27;t map to g-factor,&quot; posted <a href="https://x.com/zayaonronin/status/2054329633697309014">Zaya</a>, a technology commentator, on X. &quot;GPT-5.5 already hit saturation on MMLU-Pro, but still fails ClockBench 50% of the time.&quot; </p><p>That observation touches on what AI researchers call the &quot;<a href="https://www.nytimes.com/2026/04/15/technology/how-jagged-intelligence-can-reframe-the-ai-debate.html">jaggedness</a>&quot; problem: large language models often exhibit wildly uneven capabilities, excelling at graduate-level physics while failing at tasks a child could do. A composite score can paper over those gaps.</p><div></div><p>Pressureangle, another X user, posted a more granular critique, calling out &quot;<a href="https://x.com/Pressureangle/status/2054282480081850718">complete lack of transparency</a>&quot; and arguing the site never fully discloses how its calibration curves were created or validated. In fairness, AI IQ does list its 12 benchmarks and shows the shape of each calibration curve in its methodology modal. But the raw data and precise mathematical transformations are not published as open datasets — a gap that matters to researchers accustomed to fully reproducible methods.</p><p>Others questioned the premise itself. &quot;As useless as human IQ testing,&quot; wrote haashim on X. Shubham Sharma, an AI and technology writer, offered a constructive alternative: &quot;Why not having the Models take an official (MENSA-Grade) test? Wouldn&#x27;t this be the most accurate and most &#x27;human-comparable&#x27; way to benchmark intelligence?&quot; That approach already exists through TrackingAI, which administers the Mensa Norway IQ test to language models. But Mensa-style tests measure only abstract pattern recognition, while AI IQ attempts a broader composite across coding, mathematics, and academic reasoning. As Visual Capitalist noted, &quot;an IQ-style benchmark captures only one slice of capability.&quot; Each approach has tradeoffs — and neither has won the argument yet.</p><div></div><h2><b>The real race isn&#x27;t for the highest score — it&#x27;s for the smartest model stack</b></h2><p>For all the debate about methodology, the most important signal in AI IQ&#x27;s data may not be any single model&#x27;s score. It is the shape of the market the charts reveal.</p><p>There are now more than 50 frontier-class models available through APIs, from at least 14 major providers spanning the United States, China, and Europe. Each provider publishes its own benchmarks, often cherry-picked to showcase strengths. The result is a Tower of Babel where no two companies measure the same thing in the same way. Academic research has highlighted that &quot;most benchmarks introduce bias by focusing on a particular type of domain,&quot; and the <a href="https://www.aiiq.org/iq/">Frontier IQ Over Time</a> chart on AI IQ shows just how fast the targets are moving: in October 2023, GPT-4-turbo sat near an estimated IQ of 75. By early 2026, the top models were brushing 135 — roughly 60 points of improvement in 30 months.</p><p>That pace raises a fundamental question about whether any scoring system can keep up. The site compresses ceilings for saturated benchmarks, but as models continue to max out even the hardest tests — <a href="https://arcprize.org/arc-agi/2">ARC-AGI-2</a>, <a href="https://epoch.ai/benchmarks/frontiermath-tier-4?view=graph&amp;tab=release-date">FrontierMath Tier 4</a>, <a href="https://agi.safe.ai/">Humanity&#x27;s Last Exam</a> — the framework will face the same ceiling effects that have plagued every AI evaluation before it. Connor Forsyth pointed to this dynamic on X: &quot;<a href="https://x.com/connorsforsyth/status/2054484015319572910">ARC AGI 3 disagrees</a>,&quot; he wrote, referencing a next-generation benchmark that may already be undermining current scores.</p><p><a href="https://www.aiiq.org/">AI IQ</a> is not perfect. Its methodology is partially opaque. Its IQ metaphor can mislead. And its creator acknowledges known biases while likely missing others. But the alternative — wading through dozens of provider-specific benchmark tables, each using different test suites and scoring conventions — is worse. The site offers enterprise buyers something genuinely scarce: a single framework for comparing models across providers, dimensions, and price points, updated regularly, with enough nuance to show that the right answer to &quot;which model is best?&quot; is almost always &quot;it depends on the task.&quot;</p><p>As Debdoot Ghosh mused on X after viewing the charts: &quot;<a href="https://x.com/ryaneshea/status/2054209480917754033">Now a human&#x27;s role is just to orchestrate?</a>&quot;</p><p>Maybe. But if the AI IQ data shows anything clearly, it is that orchestration — knowing which model to deploy, when, and at what price — has become its own form of intelligence. And for that, there is no benchmark yet.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Business</category>
            <category>Data</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/LXlQ0Fb1icQNc42FLtzxY/f7d524bafbe137ed51215797495e242d/Nuneybits_Vector_art_of_glowing_scatterplot_transformed_into_co_2860d5e5-a9d2-4366-acd8-947838753fb6.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
        <item>
            <title><![CDATA[Anthropic finally beat OpenAI in business AI adoption — but 3 big threats could erase its lead]]></title>
            <link>https://venturebeat.com/technology/anthropic-finally-beat-openai-in-business-ai-adoption-but-3-big-threats-could-erase-its-lead</link>
            <guid isPermaLink="false">vDhn8EUlHvFIuZ0z264X8</guid>
            <pubDate>Wed, 13 May 2026 21:53:05 GMT</pubDate>
            <description><![CDATA[<p>For the first time since the AI race began, more American businesses are paying for Anthropic&#x27;s Claude than for OpenAI&#x27;s ChatGPT. </p><p><a href="https://ramp.com/leading-indicators/ai-index-may-2026">Adoption of Anthropic rose 3.8% in April to 34.4% of businesses</a>, according to the May 2026 release of the <a href="https://ramp.com/data/ai-index">Ramp AI Index</a>. OpenAI&#x27;s adoption fell 2.9% to 32.3%. Overall AI adoption among businesses rose 0.2 percentage points to 50.6%.</p><p>The crossover — published Tuesday by <a href="https://ramp.com/">Ramp</a>, the corporate card and finance automation platform that tracks spending patterns across more than 50,000 U.S. businesses — marks the culmination of a yearlong surge by Anthropic that few in the industry predicted. Anthropic has quadrupled its business adoption over the past year, while OpenAI grew its business adoption by only 0.3%.</p><p>But the same report that crowns a new market leader also warns that Anthropic&#x27;s position may be more fragile than it appears — threatened by escalating costs, compute constraints, and the very token-based pricing model that has fueled the company&#x27;s extraordinary revenue growth.</p><h2><b>How Anthropic went from a niche player to the most popular AI model in corporate America</b></h2><p>To appreciate the scale of the shift, consider where the two companies stood a year ago. In April 2025, <a href="https://ramp.com/leading-indicators/ai-index-may-2026">OpenAI commanded roughly 32% of business AI adoption </a>according to Ramp&#x27;s underlying data, while Anthropic stood at under 8%. OpenAI had built an early, commanding lead as the consumer default — ChatGPT was where most people first encountered AI, and that momentum carried into corporate purchasing decisions.</p><p>Anthropic&#x27;s path was different. The company was popular early on with the earliest adopters — engineers, AI evangelists, the technical vanguard inside organizations. As Ramp lead economist <a href="https://ramp.com/leading-indicators/top-saas-vendors-on-ramp-may-2026">Ara Kharazian</a> noted in the March 2026 edition of the index, Anthropic leveraged that early-adopter base to go mainstream. By February, Anthropic was winning about 70% of head-to-head matchups against OpenAI among businesses purchasing AI services for the first time — a complete reversal of the trends observed in 2025.</p><p>The trajectory is visible in Ramp&#x27;s underlying data. The company&#x27;s adoption figures show Anthropic climbing from 0.03% of businesses in June 2023 to 7.94% by April 2025, then rocketing to 34.44% by April 2026.</p><p>OpenAI, meanwhile, peaked near 36.5% in mid-2025 and has been slowly declining since. The engine behind much of this growth is a single product: <a href="https://code.claude.com/docs/en/desktop">Claude Code</a>, the company&#x27;s agentic AI coding tool, which has become the fastest-growing product in Anthropic&#x27;s history. A recent analysis estimated that 4% of all GitHub public commits worldwide were being authored by Claude Code — double the percentage from just one month prior.</p><p>Business Insider reported in April that the <a href="https://www.businessinsider.com/anthropic-may-soon-pass-openai-measure-ai-business-spending-ramp-2026-4">crossover was imminent</a>. A Ramp spokesperson told the outlet that &quot;at the current pace, Anthropic is on track to surpass OpenAI within the next two months,&quot; noting that it already led &quot;among early adopters, including VC-backed companies, and in key sectors like software, finance, and professional services.&quot; That prediction proved accurate almost to the day.</p><h2><b>AI adoption reaches a workplace tipping point, but the productivity revolution hasn&#x27;t arrived yet</b></h2><p>The Ramp data on business spending finds its complement in a separate workforce survey that underscores just how deeply AI has embedded itself into American economic life. For the first time in Gallup&#x27;s measurement, <a href="https://www.gallup.com/workplace/704225/rising-adoption-spurs-workforce-changes.aspx">half of employed American adults say they use AI in their role at least a few times a year</a>, up from 46% the previous quarter. Frequent use is also increasing, with 13% of employees now saying they use AI daily and 28% reporting they use it a few times a week or more.</p><p>But the Gallup data, based on a <a href="https://www.gallup.com/699797/indicator-artificial-intelligence.aspx">February 2026 survey of 23,717 U.S. employees</a>, also suggests that the benefits of AI remain concentrated at the level of individual tasks rather than organizational transformation. Only about one in 10 employees in AI-adopting organizations strongly agree that artificial intelligence has transformed how work gets done. That finding is consistent with firm-level studies across the U.S., U.K., Germany, and Australia showing chief executives reporting minimal broad productivity effects from AI over the past three years — a notable gap between the hype cycle and operational reality.</p><p>The <a href="https://ramp.com/data/ai-index">Ramp methodology </a>captures a different but complementary signal. Where Gallup asks employees whether they use AI, Ramp measures whether their employer is writing checks for it. The index counts corporate card and invoice-based payments, identifying firms as AI adopters if they have a positive transaction amount for an AI product or service in a given month. As Ramp&#x27;s methodology page notes, its results likely underestimate actual adoption because many employees use free AI tools or personal accounts for work tasks. Taken together, the two datasets paint a picture of AI that is ubiquitous in the American workplace but has not yet delivered on its promise to fundamentally transform how organizations operate.</p><h2><b>Why Anthropic&#x27;s biggest threat might be the success of its own best-selling product</b></h2><p>Perhaps the most striking aspect of Ramp&#x27;s analysis is its refusal to declare a lasting winner. Kharazian identified three specific risks facing Anthropic even as the company takes the lead — and the most serious one stems from a structural tension baked into the company&#x27;s business model.</p><p>Anthropic <a href="https://ramp.com/leading-indicators/ai-index-may-2026">makes more money when businesses purchase more tokens</a>, meaning the company is incentivized to drive users toward more expensive models even when cheaper ones are sufficient. This dynamic is already creating budget crises at major enterprises. Uber&#x27;s CTO revealed that <a href="https://finance.yahoo.com/sectors/technology/articles/ubers-anthropic-ai-push-hits-223109852.html">the company spent its entire 2026 AI budget in just four months</a>, largely on Claude Code and Cursor, with engineers reporting monthly API costs <a href="https://byteiota.com/uber-blows-2026-ai-budget-on-claude-code-in-4-months/">between $500 and $2,000 per person</a>. Adoption jumped from 32% to 84% of Uber engineers in a matter of months, and about 70% of committed code at Uber now comes from AI. The Uber case is a microcosm of a broader tension: Claude Code works — perhaps too well. When a productivity tool becomes so valuable that an organization&#x27;s $3.4 billion R&amp;D operation can&#x27;t afford to keep the lights on, the resulting cost scrutiny could push enterprises toward cheaper alternatives.</p><p>At the same time, quality and reliability have suffered under the weight of demand. In recent weeks, users have experienced <a href="https://www.cnbc.com/2026/05/06/anthropic-spacex-data-center-capacity.html">frequent outages</a>, <a href="https://www.anthropic.com/engineering/april-23-postmortem">rate limits</a>, and <a href="https://fortune.com/2026/04/14/anthropic-claude-performance-decline-user-complaints-backlash-lack-of-transparency-accusations-compute-crunch/">increasing dissatisfaction with Claude&#x27;s results</a>. Anthropic has responded by <a href="https://www.anthropic.com/engineering/april-23-postmortem">resetting usage limits</a> and by <a href="https://www.cnbc.com/2026/05/06/anthropic-spacex-data-center-capacity.html">striking a compute deal with SpaceX</a> to access more than 300 megawatts of new capacity at the Colossus 1 data center in Memphis. CEO Dario Amodei said the company saw &quot;<a href="https://venturebeat.com/technology/anthropic-says-it-hit-a-30-billion-revenue-run-rate-after-crazy-80x-growth">80x growth per year in revenue and usage</a>&quot; for Q1 2026, when it had only planned for 10x. And Ramp economist Rafael Hajjar found that Anthropic&#x27;s latest model update would triple token costs for any prompt that includes an image — a change that seems at odds with the company&#x27;s already-acute cost and compute problems.</p><h2><b>Open-source models and OpenAI&#x27;s Codex could quickly erode Anthropic&#x27;s narrow lead</b></h2><p>The <a href="https://ramp.com/leading-indicators/ai-index-may-2026">Ramp report</a> points to competitive dynamics that could reshape the market within months. Some of the fastest-growing vendors on Ramp&#x27;s platform in April were AI inference platforms that give companies access to cheap, open-source models — offering enterprises a way to get &quot;good enough&quot; AI at a fraction of the cost, particularly for routine tasks that don&#x27;t require frontier model capabilities.</p><p>OpenAI&#x27;s Codex presents an even more direct threat. By most measures, it is a strong product that does many of the <a href="https://composio.dev/content/claude-code-vs-openai-codex">same tasks as Claude Code at a lower price point</a> — and the switching cost between models is minimal. <a href="https://newsletter.pragmaticengineer.com/p/how-uber-uses-ai-for-development">Uber itself is already testing Codex as a hedge</a>, a move that could preview a broader pattern across enterprise tech. OpenAI also retains enormous structural advantages. <a href="https://searchengineland.com/chatgpt-900-million-weekly-active-users-470492">ChatGPT reached 900 million weekly active users by March 2026</a>, dwarfing Claude&#x27;s consumer footprint. Enterprise revenue now makes up more than 40% of OpenAI&#x27;s total and is on track to reach parity with consumer revenue by the end of 2026. And <a href="https://openai.com/index/accelerating-the-next-phase-ai/">OpenAI&#x27;s $122 billion funding round</a>, closed in March at an $852 billion valuation, gives it vast resources to compete on pricing, capacity, and product development.</p><p>Anthropic is not standing still on distribution. AWS recently launched <a href="https://aws.amazon.com/claude-platform/">Claude Platform on AWS</a>, giving enterprises direct access to Anthropic&#x27;s native platform through existing AWS credentials, billing, and access controls — a move that lowers procurement friction considerably. Anthropic has also announced <a href="https://www.anthropic.com/news/microsoft-nvidia-anthropic-announce-strategic-partnerships">compute agreements totaling billions of dollars</a> with Amazon, Google, Microsoft, Nvidia, and others, though <a href="https://www.anthropic.com/news/google-broadcom-partnership-compute">much of that capacity won&#x27;t come online until late 2026</a> or 2027. Anthropic is reportedly in talks to raise another $50 billion at a valuation approaching $900 billion.</p><h2><b>The unlikely reason businesses are choosing Claude over cheaper alternatives</b></h2><p>Beneath the spending data and market share charts lies a more intriguing question: Why are businesses choosing Anthropic over a cheaper, comparably performing alternative?</p><p>Kharazian explored this in his March analysis. <a href="https://www.leanware.co/insights/codex-vs-claude-code">Claude Code and OpenAI&#x27;s Codex are roughly comparable products</a> — on certain benchmarks, Codex is arguably better, and it&#x27;s also cheaper. Yet <a href="https://www.cnbc.com/2026/04/17/ai-tokens-anthropic-openai-nvidia.html">Anthropic can&#x27;t meet its own demand</a>. Every plan still has usage limits and rate caps. The company is actively turning away revenue because it doesn&#x27;t have the compute to serve it. Despite charging more for roughly equivalent performance, Anthropic&#x27;s demand is growing.</p><p>Kharazian suggested the answer might be cultural. Earlier this year, <a href="https://www.reuters.com/world/us-judge-blocks-pentagons-anthropic-blacklisting-now-2026-03-26/">Anthropic refused to agree to the Pentagon&#x27;s terms of use for Claude</a>, resulting in a blacklisting by the Department of Defense. OpenAI stepped in to offer its services in Anthropic&#x27;s place. In the wake of that episode, users rallied around Anthropic, and Claude temporarily surpassed ChatGPT on the App Store. The question, Kharazian wrote, is whether choosing an AI model is becoming less like an enterprise procurement decision and &quot;more like the <a href="https://time.com/article/2026/03/11/anthropic-claude-disruptive-company-pentagon/">green bubble/blue bubble distinction in iMessage</a>: a signal of identity as much as a choice of technology.&quot;</p><p>That observation may sound absurd for an enterprise software category. But Ramp&#x27;s data tells a story that pure economics cannot fully explain. In a market where the products perform similarly, where the cheaper option is arguably better on benchmarks, and where switching costs are negligible, something other than spreadsheet logic is driving the biggest shift in AI market share since the industry began. As Kharazian noted in his report: &quot;We have never seen a software industry as dynamic, where newcomers can disrupt market leaders in a matter of months, and where the pace of development overrides the typical forces of vendor stickiness.&quot;</p><p>That dynamism cuts both ways. The same forces that propelled a company from 8% to 34% market share in twelve months could just as easily work in reverse. Anthropic&#x27;s two-point lead was earned in the <a href="https://www.wsj.com/finance/stocks/the-1-6-trillion-meltdown-that-swept-through-software-stocks-86c8b3a2">most volatile software market in modern history</a> — and in this market, the distance between the throne and the floor has never been shorter.</p>]]></description>
            <author>michael.nunez@venturebeat.com (Michael Nuñez)</author>
            <category>Technology</category>
            <category>Business</category>
            <category>Data</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/4m169U8ajMEpWjEn6pQgzK/7690906968897882b8756a902d8848c6/Nuneybits_Vector_art_of_two_rising_lines_on_a_graph_burnt_orang_937edfc7-d114-495e-aad5-a2f1297757c6.webp?w=300&amp;q=30" length="0" type="image/webp"/>
        </item>
    </channel>
</rss>