<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>VentureBeat</title>
        <link>https://venturebeat.com/feed/</link>
        <description>Transformative tech coverage that matters</description>
        <lastBuildDate>Mon, 04 May 2026 11:05:26 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Copyright 2026, VentureBeat</copyright>
        <item>
            <title><![CDATA[Salesforce launches Agentforce Operations to fix the workflows breaking enterprise AI]]></title>
            <link>https://venturebeat.com/orchestration/salesforce-launches-agentforce-operations-to-fix-the-workflows-breaking-enterprise-ai</link>
            <guid isPermaLink="false">6YeC4lDpQXfmvh6I7tQzAM</guid>
            <pubDate>Fri, 01 May 2026 21:14:29 GMT</pubDate>
            <description><![CDATA[<p>Enterprise AI teams are hitting a wall — not because their models can&#x27;t reason, but because the workflows underneath them were never built for agents. Tasks fail, handoffs break, and the problem compounds as organizations push agents deeper into back-office systems. A new architectural layer is emerging to address it: workflow execution control planes that impose deterministic structure on processes agents are expected to run.</p><p>One of the companies bringing this to the forefront is Salesforce, with a new workflow platform that turns back-office workflows into a set of tasks for specialized agents to complete. Users can upload their processes or use one of the set Blueprints provided by Salesforce, and Agentforce Operations will break it down for agents. </p><p>Salesforce senior vice president of Product, Sanjna Parulekar, told VentureBeat in an interview that the problem is that many enterprise workflows are not built for agents. “What we’ve observed with customers is that a lot of times, the brokenness in a process is probably in your product requirements document,” Parulekar said. “So when that’s uploaded into a product, it doesn’t quite work. We can optimize it and cut out some things and replace it with an agent.”</p><p>Without this control panel layer, enterprises could risk deploying agents that increase cost rather than fix their workflow problems.</p><h2>Making the workflow work for agents, not just humans</h2><p>Enterprises deploying agents are learning a costly lesson: Their workflows were designed around human judgment gaps, not machine execution. Processes that evolved through years of workarounds — loosely defined steps, implicit decisions, coordination that depends on individuals knowing what to do next — break when agents are asked to follow them literally.</p><p>Even with all of an enterprise’s context at its fingertips, AI systems will have difficulty completing tasks if it is not clear what it’s supposed to do. </p><p>Parulekar said her team found that focusing on what makes the process tick and breaking it down into more explicit steps and workflows makes the system more deterministic. Then, when platforms like Agentforce Operations introduce agents, those agents already know their specific tasks.  </p><p>“It forces companies to rethink their processes and introduces observability into the mix because of the session tracing model in the system,” she said. </p><p>Parulekar said human checks can be built into the system, so the process is more transparent.</p><p>What makes this approach different from other workflow automation offerings is that it doesn’t rely on agents to decide what to do next; the system does. Unlike more traditional automation tools that route tasks and agents on probabilistic decision-making, this enforces execution on a more pre-defined, deterministic structure.</p><h2>The problem it introduces</h2><p>Codifying a workflow doesn&#x27;t fix a broken one. If a process has flawed steps, encoding it for agents locks in the problem at scale. And once workflows are distributed across agents, the challenge shifts from execution to governance: who owns the process, who validates it, and how it evolves when business conditions change.</p><p>It puts the onus on teams to take a hard look at what works for them and what doesn’t.</p><p>Organizations need to consider that, along with the execution control plane offered by platforms like Agentforce Operations, someone should be made responsible for task completion and success. </p><p>Brandon Metcalf, founder and CEO of workforce orchestration company Asymbl, told VentureBeat in a separate interview that the key to both humans and agents following a workflow is a shared goal. </p><p>“You have to understand the goal or the agent or human won’t complete the task successfully,” Metcalf said. “Someone has to manage that outcome that has to be delivered. It can be a person or an agent.”</p><p>The bottleneck has moved. As Metcalf framed it, the question is no longer whether agents can reason through a task, it&#x27;s whether the workflow underneath them is coherent enough to execute. For enterprises that built their processes around human judgment and institutional memory, that&#x27;s a harder fix than swapping in a smarter model.</p>]]></description>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/3dLXAdbIIpPoxL2H1pLZTE/d655e36d5c316dfa1e9583472909b5e4/crimedy7_illustration_of_ai_orchestration_abstract_--ar_169_-_cb6c24fb-7c3b-414f-a8d4-7733592d8d94_2.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[200,000 MCP servers expose a command execution flaw that Anthropic calls a feature]]></title>
            <link>https://venturebeat.com/security/mcp-stdio-flaw-200000-ai-agent-servers-exposed-ox-security-audit</link>
            <guid isPermaLink="false">3E86laGp0iPhy36D7TdKo8</guid>
            <pubDate>Fri, 01 May 2026 20:35:46 GMT</pubDate>
            <description><![CDATA[<p>Anthropic created the <a href="https://modelcontextprotocol.io/">Model Context Protocol</a> as the open standard for AI agent-to-tool communication. OpenAI <a href="https://techcrunch.com/2025/03/26/openai-adopts-rival-anthropics-standard-for-connecting-ai-models-to-data/">adopted it in March 2025</a>. Google DeepMind followed. Anthropic <a href="https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation">donated MCP to the Linux Foundation</a> in December 2025. Downloads crossed 150 million. Then four researchers at <a href="https://www.ox.security/blog/the-mother-of-all-ai-supply-chains-critical-systemic-vulnerability-at-the-core-of-the-mcp/">OX Security</a> found an architectural problem that affects all of them.</p><p>MCP&#x27;s STDIO transport, the default for connecting an AI agent to a local tool, executes any operating system command it receives. No sanitization. No execution boundary between configuration and command. A malicious command returns an error after the command has already run. The developer toolchain raises no flag.</p><p>OX Security researchers Moshe Siman Tov Bustan, Mustafa Naamnih, Nir Zadok and Roni Bar <a href="https://www.ox.security/blog/mcp-supply-chain-advisory-rce-vulnerabilities-across-the-ai-ecosystem/">scanned the ecosystem</a> and found 7,000 servers on public IPs with STDIO transport active — and estimate 200,000 total vulnerable instances extrapolated from that ratio. They confirmed arbitrary command execution on six live production platforms with paying customers. The research produced more than 10 CVEs rated high or critical across LiteLLM, LangFlow, Flowise, Windsurf, Langchain-Chatchat, Bisheng, DocsGPT, GPT Researcher, Agent Zero, LettaAI and others.</p><p>Kevin Curran, IEEE senior member and professor of cybersecurity at <a href="https://www.ulster.ac.uk/">Ulster University</a>, independently told <a href="https://www.infosecurity-magazine.com/news/systemic-flaw-mcp-expose-150/">Infosecurity Magazine</a> the research exposed &quot;a shocking gap in the security of foundational AI infrastructure.&quot;</p><p>Anthropic <a href="https://thehackernews.com/2026/04/anthropic-mcp-design-vulnerability.html">confirmed the behavior is by design</a> and declined to modify the protocol — characterizing STDIO&#x27;s execution model as a secure default and input sanitization as the developer&#x27;s responsibility. That characterization comes from OX; the only word Anthropic explicitly stated on the record is &quot;expected.&quot; Anthropic has not issued a standalone public statement and did not respond to VentureBeat&#x27;s request for comment.</p><p>OX says expecting 200,000 developers to sanitize inputs correctly is the problem. Anthropic&#x27;s strongest <a href="https://www.tomshardware.com/tech-industry/artificial-intelligence/anthropics-model-context-protocol-has-critical-security-flaw-exposed">technical counter</a>: sanitizing STDIO would either break the transport or move the payload one layer down. Both positions are technically coherent. The question is what to do while that debate plays out.</p><p>Every major outlet covered the disclosure. None built the prescriptive product-by-product audit a security director needs to triage her own MCP deployments. This piece does.</p><p>Five questions determine whether your MCP deployments are exposed, whether your patches hold, and what to do Monday morning.</p><h2>Am I exposed?</h2><p>If your teams deployed any MCP-connected AI agent using the default STDIO transport, yes. The insecurity is not a coding bug in any single product. It is a design default in Anthropic&#x27;s MCP specification that propagated into every official language SDK: Python, TypeScript, Java, and Rust. Every downstream project that trusted the protocol inherited it.</p><p>OX identified four exploitation families. Unauthenticated command injection through AI framework web interfaces, demonstrated against LangFlow and LiteLLM. Hardening bypasses in tools that implemented command allowlists, demonstrated against Flowise and Upsonic, where OX bypassed the allowlist through argument injection (npx -c). Zero-click prompt injection in AI coding IDEs, where malicious HTML modifies local MCP configuration files. Windsurf (CVE-2026-30615) was the only IDE where exploitation required zero user interaction, though Cursor, Claude Code, and Gemini-CLI are all vulnerable to the broader family. And malicious package distribution through MCP registries, where OX submitted a benign proof-of-concept to 11 registries, and nine accepted it without security review.</p><p>Carter Rees, VP of AI and Machine Learning at <a href="https://reputation.com/">Reputation</a> and member of the Utah AI Commission, told VentureBeat the framing needs to change entirely. &quot;MCP stdio is a privileged execution surface, not a connector. Enterprise teams should treat it like production shell access. Deny by default, allowlist, sandbox and stop assuming downstream input validation will hold at scale,&quot; Rees said.</p><p>The IDE family deserves particular attention because it hits developer workstations, not servers. A developer who visits an attacker-controlled website can trigger a modification to their local MCP configuration file — and in Windsurf&#x27;s case, the change executes immediately with no approval prompt. Cursor, Claude Code and Gemini-CLI require some form of user interaction, but if the UI presents a configuration change without surfacing the execution consequence, clicking &#x27;approve&#x27; does not constitute informed consent.</p><h2>Did my vendor patch?</h2><p>Some did. Some partially. Some have not confirmed. The matrix below maps each affected product against the exploitation family, patch state, and the gap that remains. The critical column is &quot;Protocol fix?&quot; Every row says no.</p><table><tbody><tr><td><p><b>Product</b></p></td><td><p><b>Exploit type</b></p></td><td><p><b>Patched?</b></p></td><td><p><b>Protocol fix?</b></p></td><td><p><b>The gap</b></p></td><td><p><b>Action</b></p></td></tr><tr><td><p><b>LiteLLM</b></p></td><td><p>Command injection via adapter UI</p></td><td><p>YES</p></td><td><p>NO</p></td><td><p>LiteLLM is fixed. New STDIO configs outside LiteLLM inherit the same insecure default.</p></td><td><p>Pin to v1.83.7-stable or later (CVE-2026-30623). Verify against GitHub advisory. Audit all other STDIO definitions.</p></td></tr><tr><td><p><b>LangFlow</b></p></td><td><p>RCE via public auto_login + STDIO</p></td><td><p>Partial</p></td><td><p>NO</p></td><td><p>Auth token freely available via public endpoint. STDIO executes whatever follows.</p></td><td><p>Block public auto_login. Sandbox all MCP services from the host OS.</p></td></tr><tr><td><p><b>Flowise / Upsonic</b></p></td><td><p>Allowlist bypass (npx -c argument injection)</p></td><td><p>Hardened, bypass confirmed</p></td><td><p>NO</p></td><td><p>Allowlist gives false confidence. OX bypassed it. Trivial.</p></td><td><p>Do not rely on command allowlists. Enforce process-level sandbox isolation.</p></td></tr><tr><td><p><b>Windsurf (CVE-2026-30615)</b></p></td><td><p>Zero-click prompt injection to local RCE</p></td><td><p>REPORTED, unconfirmed</p></td><td><p>NO</p></td><td><p>Only an IDE with a true zero-interaction exploit. Hits developer workstations, not servers.</p></td><td><p>Disable automatic MCP server registration. Review all active configs manually.</p></td></tr><tr><td><p><b>Cursor / Claude Code / Gemini-CLI</b></p></td><td><p>Prompt injection to local MCP config modification</p></td><td><p>Cursor patched (CVE-2025-54136); others vary</p></td><td><p>NO</p></td><td><p>User interaction required, but config-change UI does not surface execution consequence. Approval does not equal informed consent.</p></td><td><p>Audit MCP config files (~/.cursor/mcp.json, equivalent paths). Disable auto-registration. Review all pending config changes before approval.</p></td></tr><tr><td><p><b>Langchain-Chatchat (CVE-2026-30617)</b></p></td><td><p>RCE via MCP STDIO transport</p></td><td><p>REPORTED, unconfirmed</p></td><td><p>NO</p></td><td><p>Downstream chatbot framework inherits the same STDIO default. Patch status unconfirmed.</p></td><td><p>Inventory all Langchain-Chatchat deployments. Sandbox from host OS. Monitor vendor advisory for patch.</p></td></tr><tr><td><p><b>MCP registries (9 of 11)</b></p></td><td><p>Accepted malicious PoC without review</p></td><td><p>N/A</p></td><td><p>NO</p></td><td><p>Registries lack submission security review. Install and risk a backdoor.</p></td><td><p>Use registries with documented submission review. Audit installs against known-good hashes.</p></td></tr></tbody></table><h2>Does the flaw survive the patch?</h2><p>Yes. Every product-level patch in the matrix addresses the specific entry point in that product. None of them changes the MCP protocol&#x27;s STDIO behavior. A security director who patches LiteLLM today and configures a new MCP STDIO server tomorrow will inherit the same insecure default on the new server. The patches are necessary. They are not sufficient.</p><p>This was predictable. When VentureBeat first <a href="https://venturebeat.com/security/mcp-shipped-without-authentication-clawdbot-shows-why-thats-a-problem">reported on MCP&#x27;s security flaws</a> in January, Merritt Baer, chief security officer at Enkrypt AI and former deputy CISO at AWS, warned: &quot;MCP is shipping with the same mistake we&#x27;ve seen in every major protocol rollout: insecure defaults. If we don&#x27;t build authentication and least privilege in from day one, we&#x27;ll be cleaning up breaches for the next decade.&quot; The <a href="https://labs.cloudsecurityalliance.org/research/csa-research-note-mcp-by-design-rce-ox-security-20260420-csa/">Cloud Security Alliance independently confirmed</a> OX&#x27;s findings in a separate research note and recommended organizations treat MCP-connected infrastructure as an active, unpatched threat. The defaults did not change. The attack surface grew.</p><p>Rees argued that Anthropic&#x27;s position, while internally consistent, does not survive contact with enterprise reality. &quot;It stops being a developer mistake and starts being a distributed failure mode when the same class of failure reproduces across that many independent implementations,&quot; he told VentureBeat. &quot;Guidance is not an architectural control. Relying on thousands of downstream implementers to consistently interpret a trust boundary is a known anti-pattern in enterprise security.&quot;</p><p>Anthropic updated its SECURITY.md file nine days after OX&#x27;s initial contact in January 2026 to note that STDIO adapters should be used with caution, but made no architectural changes. The researchers&#x27; <a href="https://www.theregister.com/2026/04/16/anthropic_mcp_design_flaw/">assessment of that update</a>: &quot;This change didn&#x27;t fix anything.&quot;</p><p>Rees took a more measured view. &quot;It&#x27;s worth giving Anthropic credit where it&#x27;s due,&quot; he told VentureBeat. &quot;After the disclosure, they updated their security guidance to recommend caution with stdio adapters. That&#x27;s a meaningful step even if researchers argue it falls short of a protocol-level fix.&quot;</p><h2>What changed at the protocol level?</h2><p>Nothing architectural. Anthropic has not implemented manifest-only execution, a command allowlist in the official SDKs, or any other protocol-level mitigation. OX recommended all three. The SECURITY.md guidance update was the only change. OX&#x27;s research began in November 2025 and included more than 30 responsible disclosure processes across the ecosystem before the April 15 publication.</p><p>The disagreement is substantive. Anthropic&#x27;s architectural argument deserves its full weight. STDIO is a local subprocess transport designed to launch processes on the machine that configured it. The trust boundary, in Anthropic&#x27;s model, sits with whoever controls the configuration file. If you can write to the MCP config, you are by definition someone authorized to execute commands on that machine. Under that logic, what looks like command injection is a feature working as intended. Restricting what STDIO can launch at the protocol level would either break the transport&#x27;s core function, since its purpose is to launch arbitrary local processes, or displace the attack surface into the launched process itself. The unopinionated-standard argument is also defensible: a universal protocol that hard-codes execution constraints stops being universal. OX&#x27;s counter, from their <a href="https://www.ox.security/blog/the-mother-of-all-ai-supply-chains-critical-systemic-vulnerability-at-the-core-of-the-mcp/">advisory</a>: &quot;Shifting responsibility to implementers does not transfer the risk. It just obscures who created it.&quot;</p><p>Do not wait for a protocol-level fix. Treat every MCP STDIO configuration as an untrusted input surface, regardless of which product it sits inside.</p><h2>Monday morning remediation sequence</h2><p><b>Enumerate.</b> Identify every MCP server deployment across dev, staging, and production. Search for MCP configuration files (mcp.json, mcp_config.json) in developer home directories and IDE config paths (~/.cursor/, ~/.codeium/windsurf/, ~/.config/claude-code/). List running processes that match MCP server binaries. Flag any using STDIO transport with public IP accessibility. OX found 7,000 on public IPs. Your environment may have instances you do not know about.</p><p><b>Patch.</b> Pin every affected product to its patched release. <a href="https://docs.litellm.ai/blog/mcp-stdio-command-injection-april-2026">LiteLLM v1.83.7-stable</a> includes the fix for CVE-2026-30623. DocsGPT, Flowise, and Bisheng have also shipped fixes. Windsurf and Langchain-Chatchat remain in reported state as of May 1, 2026. Cursor was patched against an earlier related disclosure (CVE-2025-54136) but inherits the same protocol default. Check each vendor&#x27;s advisory in the morning you execute this step.</p><p><b>Sandbox.</b> Isolate every MCP-enabled service from the host operating system. Never give a server full disk access or shell execution privileges. The Flowise/Upsonic allowlist bypass proves that restricting commands alone is not enough.</p><p><b>Audit registries.</b> Review every MCP server installed from a third-party registry. Nine of 11 registries accepted OX&#x27;s proof-of-concept without a security review. Use registries with documented submission review processes. Remove any MCP server whose origin you cannot verify.</p><p><b>Treat STDIO config as untrusted.</b> This step survives every future patch and every future product. The protocol-level default has not changed. Every STDIO server definition is a command execution surface. Treat it the same way you treat user input to a database query: assume it is hostile until validated.</p><h2>Your exposure cannot wait for a protocol fix</h2><p>Anthropic and OX Security disagree on where the responsibility for securing MCP&#x27;s STDIO transport belongs. That disagreement will not be resolved this week. What can be resolved this week is whether your MCP deployments are enumerated, patched, sandboxed, and treated as the untrusted execution surfaces they are.</p><p>As Rees put it: &quot;The core question here is architectural policy, not exploit payloads.&quot; Baer warned in January that insecure defaults would produce exactly this outcome. OX documented 200,000 servers running with a configuration field that doubles as an execution surface. The protocol&#x27;s designer says it is working as intended. Your Monday morning question is not who is right. It is which of your servers are exposed.</p>]]></description>
            <author>louiswcolumbus@gmail.com (Louis Columbus)</author>
            <category>Security</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/5zcdzz8S6R9xMQCRiaArOG/1290a9dbaee30dd37a47fefa5b656922/ANTHROPIC.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[The AI scaffolding layer is collapsing. LlamaIndex's CEO explains what survives.]]></title>
            <link>https://venturebeat.com/infrastructure/the-ai-scaffolding-layer-is-collapsing-llamaindexs-ceo-explains-what-survives</link>
            <guid isPermaLink="false">1f6j7zMDbdqY69JO8NyZg5</guid>
            <pubDate>Fri, 01 May 2026 18:01:52 GMT</pubDate>
            <description><![CDATA[<p>The scaffolding layer that developers once needed to ship LLM applications — indexing layers, query engines, retrieval pipelines, carefully orchestrated agent loops — is collapsing. And according to Jerry Liu, co-founder and CEO of <a href="http://www.llamaindex.ai/">LlamaIndex</a>, that&#x27;s not a problem. It&#x27;s the point.</p><p>“As a result, there&#x27;s less of a need for frameworks to actually help users compose these deterministic workflows in a light and shallow manner,” Jerry Liu, co-founder and CEO of LlamaIndex, explains in a new <a href="https://www.youtube.com/watch?v=HbXvX-KtkSs&amp;list=PLMQoSwszBxm5dCv2bdqGnJ0QAL9n7Ds4_&amp;index=1&amp;pp=iAQB">VentureBeat Beyond the Pilot podcast</a>. </p><div></div><h2><b>Context is becoming the moat</b></h2><p>Liu’s LlamaIndex is one of the foremost retrieval-augmented generation (RAG) frameworks connecting private, custom, and domain-specific data to LLMs. But even he acknowledges that these types of frameworks are becoming less relevant. </p><p>With every new release, models demonstrate incremental capabilities to reason over “massive amounts” of unstructured data, and they’re getting better at it than humans, he notes. They can be trusted to reason extensively, self-correct, and perform multi-step planning; Model Context Protocol (MCP) and Claude Agent Skills plug-ins allow models to discover and use tools without requiring integrations for every one independently. </p><p>Agent patterns have consolidated toward what Liu calls a <a href="https://www.anthropic.com/engineering/managed-agents">&quot;managed agent diagram&quot;</a> — a harness layer combined with tools, MCP connectors, and skills plug-ins, rather than custom-built orchestration for every workflow.</p><p>Further, coding agents excel at writing code, meaning devs don’t need to rely on extensive libraries. In fact, about 95% of LlamaIndex code is generated by AI. “Engineers are not actually writing real code,” Liu said. “They&#x27;re all typing in natural language.” This means the layers between programmers and non-programmers is collapsing, because “the new programming language is essentially English.” </p><p>Instead of manual coding or struggling to understand API and document integration, devs can just point Claude Code at it. “This type of stuff was either extremely inefficient or just would break the agent three years ago,” said Liu. “It&#x27;s just way easier for people to build even relatively advanced retrieval with extremely simple primitives.”</p><p>So what’s the core differentiator when the stack collapses? </p><p>Context, Liu says. Agents need to be able to decipher file formats to extract the right information. Providing higher accuracy and cheaper parsing becomes key, and LlamaIndex is well-positioned here, he contends, because of its developments with agentic document processing via optical character recognition (OCR). </p><p>“We&#x27;ve really identified that there&#x27;s a core set of data that has been locked up in all these file format containers,” he said. Ultimately, “whether you use OpenAI Codex or Claude Code doesn&#x27;t really matter. The thing that they all need is context.”</p><h2><b>Keeping stacks modular</b></h2><p>There’s growing concern about builders like Anthropic locking in session data; in light of this, Liu emphasizes the importance of modularity and agnosticism. Builders shouldn’t bet on any one frontier model, or overbuild in a way that overcomplicates components of the stack. </p><p>Retrieval has evolved into “agent-plus-sandbox,” as he describes it, and enterprises must ensure that their code bases are tech debt free and adaptable to changing patterns. They also have to acknowledge that some parts of the stack will eventually need to be thrown away as a matter of course. </p><p>“Because with every new model release, there&#x27;s always a different model that is kind of the winner,” Liu said. “You want to make sure you actually have some flexibility to take advantage of it.”</p><p>Listen to the podcast to hear more about: </p><ul><li><p>LlamaIndex’s beginnings as a ‘toy project’ with initially only about 40% accuracy; </p></li><li><p>How SaaS companies can tap into complicated workflows that must be standardized and repeatable for average knowledge workers;</p></li><li><p>Why vertical AI companies are taking off and why ‘build versus buy’ is still a very valid question in the agent age. </p></li></ul><p><b>You can also listen and subscribe to </b><a href="https://beyondthepilot.ubpages.com/"><b>Beyond the Pilot</b></a><b> on </b><a href="https://open.spotify.com/show/4Zti73yb4hmiTNa7pEYls4"><b>Spotify</b></a><b>, </b><a href="https://podcasts.apple.com/us/podcast/beyond-the-pilot-enterprise-ai-in-action/id1839285239"><b>Apple</b></a><b> or wherever you get your podcasts.</b></p>]]></description>
            <author>taryn.plumb@venturebeat.com (Taryn Plumb)</author>
            <category>Infrastructure</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/3PNVXTyfSXJhvGjd00Ia1C/d051128f97407ff20b6b4db84c907811/Upscaled_already.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
        <item>
            <title><![CDATA[xAI launches Grok 4.3 at an aggressively low price and a new, fast, powerful voice cloning suite]]></title>
            <link>https://venturebeat.com/technology/xai-launches-grok-4-3-at-an-aggressively-low-price-and-a-new-fast-powerful-voice-cloning-suite</link>
            <guid isPermaLink="false">mKkkuk33W7XJ6EYwsVxIR</guid>
            <pubDate>Fri, 01 May 2026 17:49:00 GMT</pubDate>
            <description><![CDATA[<p>While Elon Musk faces off against his former colleague and OpenAI co-founder Sam Altman in <a href="https://www.theverge.com/ai-artificial-intelligence/920775/evidence-exhibits-elon-musk-sam-altman-openai-trial">court</a>, Musk&#x27;s rival firm xAI, founded to take on OpenAI, isn&#x27;t slowing down on launching competitive new products and services.</p><p>Last night, <a href="https://x.com/elonmusk/status/2050034277375672520">xAI shipped a new, proprietary base large language model (LLM), Grok 4.3</a>, and a <a href="https://console.x.ai/team/default/voice/voice-library">new voice cloning suite</a> on the web. </p><p>The new products arrive after months of tumult from xAI that <a href="https://www.fastcompany.com/91531084/inside-the-xai-exodus">saw all of Musk&#x27;s 10 original co-founders of the lab</a> and dozens more researchers exit the firm and Grok was eclipsed on performance by many new competing LLMs from the likes of OpenAI, Anthropic, Google, and Chinese firms DeepSeek, Moonshot (Kimi), Alibaba (Qwen), z.ai,  and others. </p><p>While Grok 4.3 does mark a significant leap in performance on third-party benchmarks over its direct predecessor Grok 4.2, according to the independent AI model evaluation firm <a href="https://artificialanalysis.ai/models/grok-4-3">Artificial Analysis</a>, it still remains below the state-of-the-art set by OpenAI and Anthropic&#x27;s latest models.</p><div></div><p>But the marquee feature of the Grok brand has — other than Musk&#x27;s stated opposition to &quot;<a href="https://abc3340.com/news/nation-world/elon-musk-says-wokeness-is-divisive-exclusionary-and-hateful?photo=1">wokeness</a>&quot; and its more freewheeling personality and<a href="https://www.bbc.com/news/articles/cgk2lzmm22eo"> image generation policy</a> — increasingly been its low price point when accessed by developers and users via the xAI application programming interface (API), a trend only furthered by Grok 4.3, which costs $1.25 per million input tokens and $2.50 per million output tokens (up to 200,000 input tokens, at which point costs double, a common pricing strategy of leading AI labs) compared to its direct predecessor Grok 4.2&#x27;s initial API pricing of $2/$6 per million input/output tokens.</p><p>According to <a href="https://grok.com/release-notes">xAI&#x27;s release notes</a>, Grok 4.3 began beta testing in April for subscribers to xAI&#x27;s <a href="https://grok.com/plans">SuperGrok</a> ($30 monthly) plan, and those of its sibling social network, <a href="https://x.com/i/premium_sign_up">X, through its Premium+ plan</a> ($40 monthly with 50% for first two months).  Now it&#x27;s available to all through the <a href="https://docs.x.ai/developers/models/grok-4.3">xAI API</a> and through partner <a href="https://x.com/OpenRouter/status/2049996465263759563">OpenRouter</a>. </p><h2><b>Reasoning baked-in and agentic tool-use capabilities</b></h2><p>At the core of Grok 4.3 is a fundamental shift in how the model processes information. Unlike previous iterations where &quot;chain-of-thought&quot; or reasoning could often be toggled or configured by effort levels, Grok 4.3 is built with reasoning as an active, permanent state. </p><p>This means the model is designed to &quot;think&quot; before it speaks for every query, a strategy intended to maximize factual accuracy and the handling of complex, multi-step instructions.</p><p>The model’s memory is equally expansive, featuring a <b>1 million-token context window</b>. To put this in perspective, a million tokens is roughly equivalent to several thick novels or the entire codebase of a mid-sized application. </p><p>This allows Grok 4.3 to maintain coherence over massive datasets, though xAI has implemented a &quot;Higher context pricing&quot; structure for requests that exceed the 200,000-token threshold. </p><p>This tiering suggests that while the &quot;long-term memory&quot; is available, the computational cost of managing that much information remains a significant overhead.Technically, the model accepts both text and image inputs, outputting text. </p><p>It is specifically optimized for <b>agentic workflows</b>—scenarios where an AI is not just answering a question but acting as an autonomous agent to complete a task. </p><p>For the first time, Grok has access to the same tools and environments a human professional would use. Evidence of this shift is visible in early user interactions:</p><ul><li><p><b>Spreadsheet Engineering</b>: In one instance, the model spent <b>6 minutes and 22 seconds</b> in a &quot;thought&quot; phase to build a comprehensive OSRS Sailing Combat DPS analyzer. The resulting <code>.xlsx</code> file wasn&#x27;t a simple table but a multi-sheet dashboard including a &quot;Reference_Data&quot; set and a complex &quot;DPS_Calculator&quot; with formulaic auto-calculations.</p></li><li><p><b>Professional Documentation</b>: Grok now generates formatted PDFs, such as 12-page reports on SpaceX products. These documents incorporate branding, logos, hero images, and structured tables, moving well beyond the markdown blocks of previous iterations.</p></li><li><p><b>Visual Presentations</b>: The model can design 9-slide PowerPoint decks, utilizing a &quot;Sandwich Structure&quot; (dark titles/conclusions with light content) and integrating data-driven decision matrices and humor.</p></li></ul><p>However, its knowledge of the world is not infinite; the release notes list a knowledge cut-off date of December 2025. Yet, thanks to built-in web search, Grok can reference and use up-to-date information. </p><p>In fact, Grok 4.3 arrives with an enhanced ecosystem of tools designed to make it a functional digital employee. The xAI platform now offers a robust set of server-side tools that the model can invoke autonomously based on the complexity of the query.</p><ul><li><p><b>Web and X Search</b>: These tools allow Grok to bypass its knowledge cutoff by browsing the live internet or searching X (formerly Twitter) posts, user profiles, and threads.</p></li><li><p><b>Code Execution</b>: The model can run Python code in a sandboxed environment to solve mathematical problems or process data.</p></li><li><p><b>File and Collections Search</b>: A built-in Retrieval-Augmented Generation (RAG) system allows users to query uploaded document collections or search through specific file attachments.</p></li></ul><h2><b>xAI&#x27;s Custom Voices let you clone your voice at high quality in a minute or two</b></h2><p>Beyond text, xAI has introduced <b>Custom Voices</b>, a sophisticated voice-cloning API and web-based voice cloning creation suite. </p><p>This product allows developers to clone a voice from a reference audio clip as short as 120 seconds. Once cloned, the &quot;voice ID&quot; can be used across xAI’s Text-to-Speech (TTS) and Voice Agent APIs.</p><p>xAI&#x27;s documentation emphasizes that this is not merely about timbre; the model is designed to pick up delivery patterns. </p><p>If a user records a reference clip in a &quot;customer support&quot; style, the resulting AI voice will mimic that helpful, professional inflection. </p><p>Despite the creative potential, xAI has placed strict geographic limits on this feature, making it available only in the United States, with a notable exception for Illinois due to regional biometric and privacy regulations.</p><p>While the console playground is open for general use, programmatic access via the <code>POST /v1/custom-voices</code> endpoint is currently gated to teams on an Enterprise plan. </p><p>I tried it myself and after moving through the requisite voice sampling screens on the web — the tool asks you to read aloud several passages of unrelated dialog — I indeed had a copy of my voice that sounded eerily identical to mine and accurately pronounced new words the same way I would when reading allowed from a new script it was given.</p><p>You can delete your custom voices in one click on xAI&#x27;s Custom Voices web application and create up to 30 new ones at a time.</p><p>In terms of licensing, the Custom Voices feature is strictly &quot;scoped to your team&quot; and is never made available to other users, ensuring a private, commercial license for corporate assets. </p><p>Access to the new Voice Agent API (<code>grok-voice-think-fast-1.0</code>) is billed at a flat rate of $3.00 per hour ($0.05 per minute) for speech-to-speech interactions. This is on the low-medium end of costs for other competing voice agents, according to my research:</p><table><tbody><tr><td><p><b>Service</b></p></td><td><p><b>Price per 1k Characters</b></p></td><td><p><b>Estimated Cost per Minute</b></p></td><td><p><b>Estimated Cost per Hour</b></p></td></tr><tr><td><p><b>OpenAI TTS (Standard)</b></p></td><td><p>$0.015</p></td><td><p>~$0.015</p></td><td><p>~$0.90</p></td></tr><tr><td><p><b>OpenAI TTS (HD)</b></p></td><td><p>$0.030</p></td><td><p>~$0.030</p></td><td><p>~$1.80</p></td></tr><tr><td><p><b>Grok Voice Agent</b></p></td><td><p></p></td><td><p><b>$0.05</b></p></td><td><p><b>$3.00</b></p></td></tr><tr><td><p><b>ElevenLabs (Starter)</b></p></td><td><p>~$0.30</p></td><td><p>~$0.30</p></td><td><p>~$18.00</p></td></tr><tr><td><p><b>ElevenLabs (Pro)</b></p></td><td><p>~$0.18</p></td><td><p>~$0.18</p></td><td><p>~$10.80</p></td></tr><tr><td><p><b>Play.ht</b></p></td><td><p>~$0.20</p></td><td><p>~$0.20</p></td><td><p>~$12.00</p></td></tr><tr><td><p><b>Azure/Google Cloud</b></p></td><td><p>$0.016 - $0.024</p></td><td><p>~$0.02</p></td><td><p>~$1.00 - $1.50</p></td></tr></tbody></table><p>Complementing this is the standalone Text-to-Speech (TTS) service, which offers five distinct voices (Eve, Ara, Rex, Sal, and Leo) and is priced at $4.20 per 1 million characters. </p><p>For transcription needs, the Speech-to-Text (STT) API provides real-time streaming at $0.20 per hour, while batch processing is available at a discounted rate of $0.10 per hour. </p><p>To ensure security for client-side applications, xAI utilizes Ephemeral Tokens, allowing for secure WebSocket connections without exposing primary API keys.</p><p>Once created, these voices are private to the user&#x27;s team and can be used across all voice APIs by referencing a unique 8-character alphanumeric <code>voice_id</code>. </p><p>For highly regulated sectors, xAI maintains production-ready standards, including SOC 2 Type II auditing, HIPAA eligibility for healthcare workloads, and GDPR compliance.</p><h2><b>Aggressively low API pricing as a differentiator</b></h2><p>The most aggressive aspect of the Grok 4.3 announcement is its pricing structure. Bindu Reddy, CEO of enterprise assistant startup Abacus AI <a href="https://x.com/bindureddy/status/2050028784242536585">noted on X</a> that the model is &quot;as smart as Sonnet 4.6 and 5x cheaper and faster&quot;. </p><p>The standard API rates are set at $1.25 per million input tokens and $2.50 per million output tokens. This reflects a significant reduction in cost compared to its predecessor, Grok 4.20, with Artificial Analysis reporting an approximately<b> 40% lower input price and 60% lower output price.</b></p><p>According to our calculations at VentureBeat, that places Grok-4.3 firmly in the lowest cost half of all major foundation models, far closer to Chinese open source offerings than its U.S. proprietary rivals:</p><table><tbody><tr><td><p><b>Model</b></p></td><td><p><b>Input</b></p></td><td><p><b>Output</b></p></td><td><p><b>Total Cost</b></p></td><td><p><b>Source</b></p></td></tr><tr><td><p>MiMo-V2.5 Flash</p></td><td><p>$0.10</p></td><td><p>$0.30</p></td><td><p>$0.40</p></td><td><p><a href="https://platform.xiaomimimo.com/docs/en-US/pricing">Xiaomi MiMo</a></p></td></tr><tr><td><p>Grok 4.1 Fast</p></td><td><p>$0.20</p></td><td><p>$0.50</p></td><td><p>$0.70</p></td><td><p><a href="https://docs.x.ai/docs/pricing">xAI</a></p></td></tr><tr><td><p>MiniMax M2.7</p></td><td><p>$0.30</p></td><td><p>$1.20</p></td><td><p>$1.50</p></td><td><p><a href="https://platform.minimax.io/docs/guides/models-intro">MiniMax</a></p></td></tr><tr><td><p>MiMo-V2.5</p></td><td><p>$0.40</p></td><td><p>$2.00</p></td><td><p>$2.40</p></td><td><p><a href="https://platform.xiaomimimo.com/docs/en-US/pricing">Xiaomi MiMo</a></p></td></tr><tr><td><p>Gemini 3 Flash</p></td><td><p>$0.50</p></td><td><p>$3.00</p></td><td><p>$3.50</p></td><td><p><a href="https://ai.google.dev/pricing">Google</a></p></td></tr><tr><td><p>Kimi-K2.5</p></td><td><p>$0.60</p></td><td><p>$3.00</p></td><td><p>$3.60</p></td><td><p><a href="https://platform.moonshot.cn/docs/pricing">Moonshot</a></p></td></tr><tr><td><p><b>Grok 4.3</b></p></td><td><p><b>$1.25</b></p></td><td><p><b>$2.50</b></p></td><td><p><b>$3.75</b></p></td><td><p><b></b><a href="https://docs.x.ai/developers/models/grok-4.3"><b>xAI</b></a></p></td></tr><tr><td><p>GLM-5</p></td><td><p>$1.00</p></td><td><p>$3.20</p></td><td><p>$4.20</p></td><td><p><a href="https://docs.z.ai/guides/overview/pricing">Z.ai</a></p></td></tr><tr><td><p>GLM-5-Turbo</p></td><td><p>$1.20</p></td><td><p>$4.00</p></td><td><p>$5.20</p></td><td><p><a href="https://docs.z.ai/guides/overview/pricing">Z.ai</a></p></td></tr><tr><td><p>DeepSeek V4 Pro</p></td><td><p>$1.74</p></td><td><p>$3.48</p></td><td><p>$5.22</p></td><td><p><a href="https://api-docs.deepseek.com/quick_start/pricing">DeepSeek</a></p></td></tr><tr><td><p>GLM-5.1</p></td><td><p>$1.40</p></td><td><p>$4.40</p></td><td><p>$5.80</p></td><td><p><a href="https://docs.z.ai/guides/overview/pricing">Z.ai</a></p></td></tr><tr><td><p>Claude Haiku 4.5</p></td><td><p>$1.00</p></td><td><p>$5.00</p></td><td><p>$6.00</p></td><td><p><a href="https://www.anthropic.com/pricing">Anthropic</a></p></td></tr><tr><td><p>Qwen3-Max</p></td><td><p>$1.20</p></td><td><p>$6.00</p></td><td><p>$7.20</p></td><td><p><a href="https://www.alibabacloud.com/help/en/model-studio/developer-reference/model-pricing">Alibaba Cloud</a></p></td></tr><tr><td><p>Gemini 3 Pro</p></td><td><p>$2.00</p></td><td><p>$12.00</p></td><td><p>$14.00</p></td><td><p><a href="https://ai.google.dev/pricing">Google</a></p></td></tr><tr><td><p>GPT-5.4</p></td><td><p>$2.50</p></td><td><p>$15.00</p></td><td><p>$17.50</p></td><td><p><a href="https://openai.com/api/pricing/">OpenAI</a></p></td></tr><tr><td><p>Claude Opus 4.7</p></td><td><p>$5.00</p></td><td><p>$25.00</p></td><td><p>$30.00</p></td><td><p><a href="https://platform.claude.com/docs/en/about-claude/pricing">Anthropic</a></p></td></tr><tr><td><p>GPT-5.5</p></td><td><p>$5.00</p></td><td><p>$30.00</p></td><td><p>$35.00</p></td><td><p><a href="https://openai.com/api/pricing/">OpenAI</a></p></td></tr></tbody></table><p>However, the &quot;reasoning&quot; nature of the model introduces a new billing category: Reasoning tokens. </p><p>These are tokens generated during the model&#x27;s internal thinking process and are billed at the same rate as standard completion tokens. Effectively, users pay for the AI to &quot;think&quot; before it provides the final answer. xAI has also introduced several unique fee structures:</p><ul><li><p><b>Prompt Caching</b>: Repeated prompts are significantly cheaper, at <b>$0.20 per million tokens</b>, incentivizing developers to reuse context.</p></li><li><p><b>Tool Invocations</b>: While token usage for tools is billed at standard rates, the act of invoking a tool carries a flat fee—$5.00 per 1,000 calls for Web Search or Code Execution, and $10.00 for File Attachments.</p></li><li><p><b>Usage Guideline Violation Fee</b>: In a move that may set a new industry precedent, xAI charges a <b>$0.05 fee</b> for requests that are blocked by their safety filters before generation even begins.</p></li></ul><p>The model itself remains accessible via a standard commercial API, with xAI recommending that all developers migrate to <code>grok-4.3</code> as their &quot;most intelligent and fastest model&quot;.</p><h2><b>Third-party benchmark evaluations and analysis</b></h2><p>The reception of Grok 4.3 has been polarized, depending largely on the specific use case. Professional benchmarkers and developers have highlighted a &quot;stark gap&quot; between the model&#x27;s domain-specific strengths and its general reasoning consistency.</p><div></div><p>According to <a href="https://x.com/ValsAI/status/2049979820227063953">independent AI evaluation firm Vals AI</a>, Grok 4.3 has taken the top spot on several specialized indices. It currently ranks #1 on CaseLaw v2 (79.3% accuracy) and #1 on CorpFin. </p><p>This 25-point jump in legal reasoning over Grok 4.20 suggests that the &quot;always-on reasoning&quot; architecture is particularly well-suited for the dense, logical structures of law and finance. </p><p><a href="https://x.com/ArtificialAnlys/status/2049987001655714250">Artificial Analysis corroborated</a> this performance, noting a massive improvement in agentic tasks, scoring an Elo of 1500 on the GDPval-AA benchmark, surpassing competitors like Gemini 3.1 Pro and GPT-5.4 mini.</p><p>Conversely, users focused on general-purpose agents and coding have highlighted deficiencies. </p><p>AI automated brick-and-mortar retail company <a href="https://x.com/andonlabs/status/2050056965460734325">Andon Labs reported that Grok 4.3 is a &quot;big regression&quot;</a> on the Vending-Bench 2, which measures an AI&#x27;s ability to take consistent actions in a simulation. </p><p>They colorfully described the model as having &quot;narcolepsy problems,&quot; preferring to remain inactive for multiple simulation days rather than taking the required actions.</p><p>The sentiment was echoed by Vals AI, which noted that while the model improved in some coding areas, it remains weak on general coding tasks and &quot;struggles with difficult math problems,&quot; scoring only 11% on ProofBench.</p><h2><b>Should your enterprise use Grok 4.3?</b></h2><p>The launch of Grok 4.3 represents a calculated bet by xAI that the market wants <b>specialized brilliance</b> and <b>extreme cost efficiency</b> over a perfectly balanced generalist. </p><p>By achieving a score of 53 on the Artificial Analysis Intelligence Index while remaining on the &quot;Pareto frontier&quot; of cost-per-intelligence, xAI is positioning itself as the &quot;value&quot; leader for enterprise applications in legal and financial tech.</p><p>The &quot;always-on reasoning&quot; is a double-edged sword. While it provides the depth needed to navigate complex case law, the community reports of &quot;narcolepsy&quot; suggest that a model that is always &quot;thinking&quot; may occasionally think itself into a state of paralysis, or at least a state of excessive caution that inhibits agentic action.</p><p>In addition, prior Grok model scandals including an X chatbot version referring to itself as &quot;<a href="https://www.npr.org/2025/07/09/nx-s1-5462609/grok-elon-musk-antisemitic-racist-content">MechaHitler</a>&quot; and posting antisemitic content, <a href="https://www.nbcnews.com/tech/tech-news/musks-ai-chatbot-grok-xai-making-sexual-deepfakes-imagine-rcna265855">sexualized deepfake imagery generation</a> and <a href="https://www.bbc.com/news/articles/clye99wg0y8o">investigations</a>, and r<a href="https://venturebeat.com/ai/elon-musks-xai-tries-to-explain-groks-south-african-race-relations-freakout-the-other-day">eferences to racial conflicts</a> and <a href="https://venturebeat.com/ai/musks-attempts-to-politicize-his-grok-ai-are-bad-for-users-and-enterprises-heres-why">right-wing dog whistle framing of social issues</a> — which appear to mirror many of founder Musk&#x27;s own positions, to the point that the model was at one point,<a href="https://www.cnbc.com/2025/07/11/grok-4-appears-to-reference-musks-views-when-answering-questions-.html"> checking Musk&#x27;s own X account before responding in its X implementation</a> — nearly certain to give some enterprises pause when considering adoption. It&#x27;s unclear whether any of those issues remain with Grok 4.3, but one user did note that <a href="https://x.com/lefthanddraft/status/2050021868229611654">Grok&#x27;s system prompt appears</a> to instruct it <!-- -->&quot;you do not assign broad positive/negative utility functions to groups of people.&quot;<!-- -->  </p><p>For developers, the decision to adopt Grok 4.3 will likely come down to the nature of their data. For those needing to process a million tokens of legal documents at a fraction of the cost of Claude 4.6 or GPT-5.5, Grok 4.3 is a clear front-runner. </p><p>For those building high-frequency autonomous agents or complex math solvers, the &quot;narcolepsy&quot; and coding regressions suggest that xAI&#x27;s latest model may still need a few more &quot;tuning passes&quot;.</p><p>As <a href="https://x.com/OpenRouter/status/2049996465263759563">OpenRouter noted on X upon making the model live</a>, the &quot;large jump in agentic performance&quot; at a lower price point is an undeniable milestone. Whether that performance can be sustained across all domains remains the primary question for the summer of 2026.</p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/6c9N7ubweMcf8hAUjcDZIH/fb25ad47038633db57b73f2f45bc3225/FkIIbTjMYUsldxbqMHtky_g5BcjizZ.jpg?w=300&amp;q=30" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[  Hidden IT problems are quietly creating risk, shadow IT, and lost productivity]]></title>
            <link>https://venturebeat.com/technology/hidden-it-problems-are-quietly-creating-risk-shadow-it-and-lost-productivity</link>
            <guid isPermaLink="false">4P4nzlZd3oWkp4Rc6fzWGs</guid>
            <pubDate>Fri, 01 May 2026 13:03:15 GMT</pubDate>
            <description><![CDATA[<p><i>Presented by TeamViewer</i></p><hr/><p>Enterprise technology failures are largely invisible. <a href="https://ad.doubleclick.net/ddm/trackclk/N2700600.133013VENTUREBEAT/B35613896.445429744;dc_trk_aid=639044651;dc_trk_cid=253275872;dc_lat=;dc_rdid=;tag_for_child_directed_treatment=;tfua=;gdpr=${GDPR};gdpr_consent=${GDPR_CONSENT_755};ltd=;dc_tdv=1">Research from TeamViewer</a>, based on a global survey of 4,200 managers and employees, finds that the majority of digital dysfunction never reaches the IT help desk. </p><p>Employees work around slow applications, failed logins, and intermittent glitches rather than reporting them, leaving organizations without an accurate picture of how their technology is performing. The cumulative cost is significant: employees lose an average of 1.3 workdays per month to digital friction, with impacts ranging from delayed projects and lost revenue to increased employee turnover.</p><p>The research, which surveyed managers and employees across nine countries, confirms what many have long suspected: the productivity loss from digital friction is significant, and most of it never surfaces in an IT support queue, says Andrew Hewitt, VP of strategic technology at TeamViewer.</p><p>“Enterprise outages are visible because they trigger clear, system-level failures,” Hewitt says. “But much of the real disruption happens earlier, in the form of digital friction: slow apps, login issues, or intermittent glitches that don’t cross alert thresholds. These smaller issues often go unreported or are normalized by employees, even though they quietly drain productivity.”</p><h2>What is digital friction and why does it go unreported?</h2><p>The most common sources of friction — connectivity failures, software crashes, hardware problems, and authentication issues — aren’t edge-case scenarios, but everyday experiences employees have learned to absorb without escalating. Connectivity problems were the most widespread, with nearly half identifying them as the top productivity killer among common technology issues.</p><p>That tendency to absorb rather than report is central to the problem. Many workers don’t trust their IT team to resolve issues quickly or effectively, so when a login fails or an application stalls mid-task, the path of least resistance is to restart the device, switch tools, or use a personal phone.</p><p>“Employees are under more pressure than ever to prove output,” Hewitt says. “When reporting feels unlikely to result in a quick resolution, it creates a false sense of stability at the system level while the employee experience quietly deteriorates.”</p><h2>How much productivity does digital friction cost organizations?</h2><p>The business consequences extend beyond inconvenience. Many organizations report delays in critical operations, revenue loss, and lost customers as a result of IT dysfunction. Most respondents lose time each month, and few expect improvement, citing increasing complexity of workplace technology as a primary concern.</p><p>The human cost runs parallel. Workers link digital friction to frustration, decreased motivation, and burnout, and many believe it contributes to turnover, with onboarding replacements stretching to eight weeks or more.</p><p>&quot;Employees are happiest when they feel productive and accomplished at the end of the day,&quot; Hewitt says. &quot;When people can&#x27;t make progress in their day-to-day work, frustration builds and burnout follows. Great technology might not be a main attractor of talent, but bad technology can certainly play a role in driving it away.&quot;</p><h2>Why employees use personal devices and unauthorized tools instead of reporting IT problems</h2><p>When workplace technology consistently fails to meet employee needs, workers find alternatives, with a substantial share of respondents admitting to using personal devices or unauthorized applications as workarounds. That&#x27;s the entry point for shadow IT, or the use of unapproved hardware, software, or cloud services outside IT&#x27;s visibility and control. While employees turn to these tools simply to stay productive, they introduce security vulnerabilities, data leakage risks, and compliance gaps that IT teams may not discover until a breach occurs.</p><p>“Quite simply, it demonstrates that the IT environment is not meeting the employees’ needs,” Hewitt said. “While this helps maintain short-term productivity, it introduces significant risks and pushes work outside of IT’s visibility and control.”</p><p><a href="https://ad.doubleclick.net/ddm/trackclk/N2700600.133013VENTUREBEAT/B35613896.445589096;dc_trk_aid=639045293;dc_trk_cid=253275872;dc_lat=;dc_rdid=;tag_for_child_directed_treatment=;tfua=;gdpr=${GDPR};gdpr_consent=${GDPR_CONSENT_755};ltd=;dc_tdv=1">TeamViewer ONE</a> addresses this by combining remote connectivity with real-time endpoint monitoring, giving IT teams the ability to detect and resolve device and application issues before employees reach for an alternative. When the underlying environment is stable and support is fast, the impulse to work around it diminishes.</p><h2>How fragmented IT infrastructure creates blind spots across devices, apps, and networks</h2><p>Addressing digital friction at scale requires more than faster help desk response times. Traditional metrics such as mean time to resolution and ticket volume capture only a fraction of actual issues. A more complete picture requires measuring lost time, interrupted workflows, and employee sentiment across devices, applications, and network environments.</p><p>“Leaders need to move beyond measuring performance through IT tickets alone,” Hewitt said. “Performance should be viewed through the lens of employee experience and real-time digital workplace data.”</p><p>Fragmented infrastructure makes this difficult. When devices, applications, and networks operate in separate silos, IT teams struggle to trace root causes or identify systemic issues before they spread, often responding to symptoms rather than underlying problems.</p><p>TeamViewer ONE is designed to close that gap, integrating digital employee experience analytics, remote support, and device management into a single platform. Instead of piecing together signals from disconnected tools, IT teams get a consolidated view of endpoint health, application performance, and network conditions across the entire organization.</p><h2>How organizations can shift from reactive IT support to proactive system monitoring</h2><p>Achieving proactive IT is not a single-step transformation. Hewitt describes it as a progression: starting with endpoint management and security, building toward real-time visibility into the digital employee experience, and ultimately using automation and AI to resolve issues before they reach employees.</p><p>TeamViewer AI is built to support each stage of that progression, using continuous monitoring to surface anomalies and correlate signals across the digital environment, identifying patterns of poor experience before they escalate. When issues are detected, it suggests remediations, generates scripts to fix problems autonomously, and handles routine tasks such as common troubleshooting without requiring IT intervention, shifting the workload from reactive firefighting toward proactive oversight.</p><p>And while AI&#x27;s effectiveness depends on the completeness of the data it works with, consolidating onto a platform like TeamViewer ONE removes that limitation by giving AI a complete, real-time data foundation to work from.</p><h2>How system performance lays the foundation for productivity, retention, and competitive advantage</h2><p>TeamViewer ONE isn&#x27;t a wholesale replacement of existing IT infrastructure, but a unifying layer that connects insight with action, which enables organizations to ramp up productivity, improve retention, and ultimately realize a significant competitive advantage. It begins with visibility into what is actually causing friction across their environment. From there, leaders can use that data to prioritize fixes, and then scale remediation through automation as confidence and capability grow.</p><p>&quot;Reducing digital friction isn&#x27;t about overhauling everything at once,&quot; Hewitt said. &quot;Leaders should start small, gain visibility into what&#x27;s actually causing friction, fix the biggest pain points, then scale those improvements through automation and AI. Even incremental progress can make an impact on employee engagement and productivity.&quot;</p><p><i>Dig deeper: </i><a href="https://ad.doubleclick.net/ddm/trackclk/N2700600.133013VENTUREBEAT/B35613896.445589102;dc_trk_aid=639034578;dc_trk_cid=253275872;dc_lat=;dc_rdid=;tag_for_child_directed_treatment=;tfua=;gdpr=${GDPR};gdpr_consent=${GDPR_CONSENT_755};ltd=;dc_tdv=1"><i>Fix it before they feel it from TeamViewer</i></a><i>.</i></p><hr/><p><i>Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact </i><a href="mailto:sales@venturebeat.com"><i><u>sales@venturebeat.com</u></i></a><i>.</i></p>]]></description>
            <category>Technology</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/eFxVmpdHxuJILeB98CSqk/4cc177a46197d0a5c6bc968e3da4495b/AdobeStock_987891135.jpeg?w=300&amp;q=30" length="0" type="image/jpeg"/>
        </item>
        <item>
            <title><![CDATA[Alibaba's Metis agent cuts redundant AI tool calls from 98% to 2% — and gets more accurate doing it]]></title>
            <link>https://venturebeat.com/orchestration/alibabas-metis-agent-cuts-redundant-ai-tool-calls-from-98-to-2-and-gets-more-accurate-doing-it</link>
            <guid isPermaLink="false">578O7IXrcxoBTEvWWPuepu</guid>
            <pubDate>Thu, 30 Apr 2026 20:51:26 GMT</pubDate>
            <description><![CDATA[<p>One of the key challenges of building effective AI agents is teaching them to choose between using external tools or relying on their internal knowledge. But large language models are often trained to blindly invoke tools, which causes latency bottlenecks, unnecessary API costs, and degraded reasoning caused by environmental noise. </p><p>To overcome this challenge, researchers at Alibaba introduced <a href="https://arxiv.org/abs/2604.08545v1">Hierarchical Decoupled Policy Optimization</a> (HDPO), a reinforcement learning framework that trains agents to balance both execution efficiency and task accuracy. </p><p>Metis, a multimodal model they trained using this framework, reduces redundant tool invocations from 98% to just 2% while establishing new state-of-the-art reasoning accuracy across key industry benchmarks. This framework helps create AI agents that are not trigger-happy and know when to abstain from using tools, enabling the development of responsive and cost-effective agentic systems.</p><h2>The metacognitive deficit</h2><p>Current agentic models face what the researchers call a “profound metacognitive deficit.” The models have a hard time deciding when to use their internal parametric knowledge versus when to query an external utility. As a result, they blindly invoke tools and APIs, like web search or code execution, even when the user&#x27;s prompt already contains all the necessary information to resolve the task.</p><p>This trigger-happy tool-calling behavior creates severe operational hurdles for real-world applications. Because the models are trained to focus almost entirely on task completion, they are indifferent to latency. These agents frequently hit exorbitant tool call rates. Every unnecessary external API call introduces a serial processing bottleneck, turning a technically capable AI into a sluggish system that frustrates users and burns through tool budgets.</p><p>At the same time, burning computational resources on excessive tool use does not translate to better reasoning. Redundant tool interactions inject noise into the model’s context. This noise can distract the model, derailing an otherwise sound chain of reasoning and actively degrading the final output.</p><p>To address the latency and cost issues of blind tool invocation, previous reinforcement learning methods attempted to penalize excessive tool usage by combining task accuracy and execution efficiency into one reward signal. However, this entangled design creates an unsolvable optimization dilemma. If the efficiency penalty is too aggressive, the model becomes overly conservative and suppresses essential tool use, sacrificing correctness on arduous tasks. Conversely, if the penalty is mild, the optimization signal loses its value and does not prevent tool overuse on simpler tasks.</p><p>Furthermore, this shared reward creates semantic ambiguity, where an inaccurate trajectory with zero tool calls might yield the same reward as an accurate trajectory with excessive tool usage. Because the training signals for accuracy and efficiency become entangled, the model can’t learn to control tool-use without degrading its core reasoning capabilities.</p><h2>Hierarchical decoupled policy optimization</h2><p>To solve the optimization dilemma of coupled rewards, the researchers introduced HDPO. HDPO separates accuracy and efficiency into two independent optimization channels. The accuracy channel focuses on maximizing task correctness across all of the model&#x27;s rollouts. The efficiency channel optimizes for execution economy.</p><p>HDPO computes the training signals for these two channels independently and only combines them at the final stage of loss computation. The efficiency signal is conditional upon the accuracy channel. This means that an incorrect response is never rewarded simply for being fast or using fewer tools. This decoupling avoids situations where accuracy and efficiency gradients cancel each other out, providing the AI with clean learning signals for both goals.</p><p>The most powerful emergent property of this decoupled design is that it creates an implicit cognitive curriculum. Early in training, when the model still struggles with the task, the optimization is dominated by the accuracy objective, forcing the model to prioritize learning correct reasoning and knowledge. As the model&#x27;s reasoning capabilities mature and it consistently arrives at the right answers, the efficiency signal smoothly scales up. This mechanism causes the model to first master task resolution, and only then refine its self-reliance by avoiding redundant, costly API calls.</p><p>To complement HDPO, the researchers developed a rigorous, multi-stage data curation regime that tackles severe flaws found in existing tool-augmented datasets. Their data curation pipeline covers supervised fine-tuning (SFT) and reinforcement learning (RL) stages.</p><p>For the SFT phase, they sourced data from publicly available tool-augmented multimodal trajectories and filtered them to remove low-quality examples containing execution failures or feedback inconsistencies. They also aggressively filtered out any training sample that the base model could solve directly without tools. Finally, using Google&#x27;s <a href="https://venturebeat.com/technology/google-gemini-3-1-pro-first-impressions-a-deep-think-mini-with-adjustable">Gemini 3.1 Pro</a> as an automated judge, they filtered the SFT corpus to only keep examples that demonstrated strategic tool use.</p><p>For the RL phase, the curation focused on ensuring a stable optimization signal. They filtered out prompts with corrupted visuals or semantic ambiguity. The HDPO algorithm relies on comparing correct and incorrect responses. If a task is trivially easy where the model always gets it right, or prohibitively hard where the model always fails, there is no meaningful mathematical variance to learn from. The team strictly retained only prompts that exhibited a non-trivial mix of successes and failures to guarantee an actionable gradient signal.</p><h2>Metis agent: HDPO  in action</h2><p>To test HDPO in action, the researchers used the framework to develop Metis, a multimodal reasoning agent equipped with coding and search tools. Metis is built on top of the Qwen3-VL-8B-Instruct vision-language model. The researchers trained it in two distinct stages. First, they applied SFT using their curated data to provide a cold-start initialization. Next, they applied RL using the HDPO framework, exposing the model to multi-turn interactions where it could invoke tools like Python code execution, text search, and image search.</p><p>The researchers pitted Metis against standard open-source vision models like LLaVA-OneVision, text-only reasoners, and state-of-the-art agentic models including DeepEyes V2 and the 30-billion-parameter Skywork-R1V4. The evaluation spanned two main areas: visual perception and document understanding datasets like HRBench and V*Bench, and rigorous mathematical and logical reasoning tasks like WeMath and MathVista.</p><p>On all tasks, Metis achieved state-of-the-art or highly competitive performance, outperforming existing agentic models — including the much larger 30-billion-parameter Skywork-R1V4 — across both visual perception and reasoning tasks.</p><p>Equally important is the anecdotal behavior Metis showed in the experiments. For example, when presented with an image of a museum sign and asked what the center text says, standard agentic models waste time blindly writing Python scripts to crop the image just to read it. Metis, however, recognizes that the text is clearly legible in the raw image. It skips the tools entirely and uses a single inference pass.</p><p>In another experiment, the model was given a complex chart and asked to identify the second-highest line at a specific data point within a tiny subplot. Metis recognized that fine-grained visual analysis exceeded its native resolution capabilities and could not accurately distinguish the overlapping lines. Instead of guessing from the full image, it invoked Python to crop and zoom in exclusively on that specific subplot region, allowing it to correctly identify the line. It treats code as a precision instrument deployed only when the visual evidence is genuinely ambiguous, not as a default fallback.</p><p>The researchers released <a href="https://huggingface.co/Accio-Lab/Metis-8B-RL">Metis</a> along with the <a href="https://github.com/Accio-Lab/Metis">code for HDPO</a> under the permissive Apache 2.0 license.</p><p>“Our results demonstrate that strategic tool use and strong reasoning performance are not a trade-off; rather, eliminating noisy, redundant tool calls directly contributes to superior accuracy,” the researchers conclude. “More broadly, our work suggests a paradigm shift in tool-augmented learning: from merely teaching models how to execute tools, to cultivating the meta-cognitive wisdom of when to abstain from them.”</p>]]></description>
            <author>bendee983@gmail.com (Ben Dickson)</author>
            <category>Orchestration</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/5adrVJG12DsZYPv3bAT3Kk/786e22dcb26f295b11a3de9d91a97ac3/LLM_tool-use_abstention.jpg?w=300&amp;q=30" length="0" type="image/jpg"/>
        </item>
        <item>
            <title><![CDATA[One tool call to rule them all? New open source Python tool Runpod Flash eliminates containers for faster AI dev]]></title>
            <link>https://venturebeat.com/infrastructure/one-tool-call-to-rule-them-all-new-open-source-python-tool-runpod-flash-eliminates-containers-for-faster-ai-dev</link>
            <guid isPermaLink="false">2QDcDx0oeDpGEUWci5HZUp</guid>
            <pubDate>Thu, 30 Apr 2026 18:31:07 GMT</pubDate>
            <description><![CDATA[<p><a href="https://www.runpod.io/about">Runpod</a>, the high-performance cloud computing and GPU platform designed specifically for AI development, today launched a new open source, MIT licensed, enterprise-friendly Python programming tool called <a href="https://www.runpod.io/blog/flash-is-ga">Runpod Flash</a> — and it is poised to make creation, iteration and deployment of AI systems inside and outside of foundation model labs much faster. </p><p>The tool aims to eliminate some of the biggest barriers and hurdles to training and using AI models today, namely, doing away with Docker packages and containerization when developing for serverless GPU infrastructure, which the company believes will speed up development and deployment of new AI models, applications and agentic workflows. </p><p>Additionally, the platform is built to serve as a critical substrate for AI agents and coding assistants—such as Claude Code, Cursor, and Cline—enabling them to orchestrate and deploy remote hardware autonomously with minimal friction.</p><p>Developers can utilize Flash to accomplish a diverse set of high-performance computing tasks, including cutting-edge deep learning research, model training, and fine-tuning. </p><p>&quot;We make it as easy as possible to be able to bring together the cosmos of different AI tooling that&#x27;s available in a function call,&quot; said Runpod chief technology officer (CTO) Brennen Smith, in a video call interview with VentureBeat last week. </p><p>The tool allows for the creation of sophisticated &quot;polyglot&quot; pipelines, where users can route data preprocessing to cost-effective CPU workers before automatically handing off the workload to high-end GPUs for inference. </p><p>Beyond research and development, Flash supports production-grade requirements through features such as low-latency load-balanced HTTP APIs, queue-based batch processing, and persistent multi-datacenter storage.</p><h3><b>Eliminating the &#x27;packaging tax&#x27; of AI development</b></h3><p>The core value proposition of Flash GA is the removal of Docker from the serverless development cycle. </p><p>In traditional serverless GPU environments, a developer must containerize their code, manage a Dockerfile, build the image, and push it to a registry before a single line of logic can execute on a remote GPU. Runpod Flash treats this entire process as a &quot;packaging tax&quot; that slows down iteration cycles. </p><p>Under the hood, Flash utilizes a cross-platform build engine that enables a developer working on an M-series Mac to produce a Linux x86_64 artifact automatically. </p><p>This system identifies the local Python version, enforces binary wheels, and bundles dependencies into a deployable artifact that is mounted at runtime on Runpod’s serverless fleet. </p><p>This mounting strategy significantly reduces &quot;cold starts&quot;—the delay between a request and the execution of code—by avoiding the overhead of pulling and initializing massive container images for every deployment. </p><p>Furthermore, the technology infrastructure supporting Flash is built on a proprietary Software Defined Networking (SDN) and Content Delivery Network (CDN) stack.</p><p>Smith told VentureBeat that the hardest problems in GPU infrastructure are often not the GPUs themselves, but the networking and storage components that link them together. </p><p>&quot;Everyone is talking about agentic AI, but the way I personally see it — and the way the leadership team at Runpod sees it — is that there needs to be a really good substrate and glue for these agents, whatever they might be powered by, to be able to work with,&quot; Smith said.</p><p>Flash leverages this low-latency substrate to handle service discovery and routing, enabling cross-endpoint function calls. This allows developers to build &quot;polyglot&quot; pipelines where, for instance, a cheap CPU endpoint handles data preprocessing before routing the clean data to a high-end NVIDIA H100 or B200 GPU for inference. </p><h2><b>Four distinct workload architectures supported</b></h2><p>While the Flash beta focused on live-test endpoints, the GA release introduces a suite of features designed for production-grade reliability.</p><p>The primary interface is the new <code>@Endpoint</code> decorator, which consolidates configuration—such as GPU type, worker scaling, and dependencies—directly into the code. The GA release defines four distinct architectural patterns for serverless workloads: </p><ul><li><p><b>Queue-based</b>: Designed for asynchronous batch jobs where functions are decorated and run. </p></li><li><p><b>Load-balanced</b>: Tailored for low-latency HTTP APIs where multiple routes share a pool of workers without queue overhead. </p></li><li><p><b>Custom Docker Images</b>: A fallback for complex environments like vLLM or ComfyUI where a pre-built worker is already available. </p></li><li><p><b>Existing Endpoints</b>: Using Flash as a Python client to interact with previously deployed Runpod resources via their unique IDs. </p></li></ul><p>A critical addition for production environments is the <code>NetworkVolume</code> object, which provides first-class support for persistent storage across multiple datacenters. </p><p>Files mounted at <code>/runpod-volume/</code> allow for model weights and large datasets to be cached once and reused, further mitigating the impact of cold starts during scaling events. </p><p>Additionally, Runpod has introduced environment variable management that is excluded from the configuration hash, meaning developers can rotate API keys or toggle feature flags without triggering an entire endpoint rebuild.</p><p>To address the rise of AI-assisted development, Runpod has released specific skill packages for coding agents like Claude Code, Cursor, and Cline. </p><p>These packages provide agents with deep context regarding the Flash SDK, effectively reducing syntax hallucinations and allowing agents to write functional deployment code autonomously. </p><p>This move positions Flash not just as a tool for humans, but as the &quot;substrate and glue&quot; for the next generation of AI agents. </p><h2><b>Why open source Runpod Flash?</b></h2><p>Runpod has released the Flash SDK under the <b>MIT License</b>, one of the most permissive open-source licenses available.</p><p>This choice is a deliberate strategic move to maximize market share and developer adoption. In contrast to more restrictive licenses like the <b>GPL (General Public License)</b>, which can impose &quot;copyleft&quot; requirements—potentially forcing companies to open-source their own proprietary code if it links to the library—the MIT license allows for unrestricted commercial use, modification, and distribution. </p><p>Smith explained this philosophy as a &quot;motivating construct&quot; for the company: &quot;I prefer to win based on product quality and product innovation rather than legal ease and lawyers,&quot; he told VentureBeat.</p><p>By adopting a permissive license, Runpod lowers the barrier for enterprise adoption, as legal teams do not have to navigate the complexities of restrictive open-source compliance. </p><p>Furthermore, it invites the community to fork and improve the tool, which Runpod can then integrate back into the official release, fostering a collaborative ecosystem that accelerates the development of the platform. </p><h2><b>Timing is everything: Runpod&#x27;s growth and market positioning</b></h2><p>The launch of Flash GA comes at a time of explosive growth for <a href="https://www.runpod.io/press/runpod-ai-cloud-surpasses-120m-in-arr">Runpod, which has surpassed $120 million in Annual Recurring Revenue (ARR)</a> and serves a developer base of over 750,000 since it was<a href="https://www.runpod.io/blog/founder-series-1-origin-story"> founded in 2022</a>.</p><p>The company’s growth is driven by two distinct segments: the &quot;P90&quot; enterprises—large-scale operations like Anthropic, OpenAI, and Perplexity—and the &quot;sub-P90&quot; independent researchers and students who represent the vast majority of the user base. </p><p>The platform’s agility was recently demonstrated during the <a href="https://venturebeat.com/technology/deepseek-v4-arrives-with-near-state-of-the-art-intelligence-at-1-6th-the-cost-of-opus-4-7-gpt-5-5">release of DeepSeek V4 in preview</a> last week. Within minutes of the model’s debut, developers were utilizing Runpod infrastructure to deploy and test the new architecture. </p><p>This &quot;real-time&quot; capability is a direct result of Runpod’s specialized focus on AI developers, offering over 30 GPU SKUs and billing by the millisecond to ensure that every dollar of spend results in maximum throughput. </p><p>Runpod&#x27;s position as the &quot;most cited AI cloud on GitHub&quot; suggests that it has successfully captured the developer mindshare required to sustain its momentum. </p><p>With Flash GA, the company is attempting to transition from being a provider of raw compute to becoming the essential orchestration layer for the AI-first cloud. </p><p>As development shifts toward &quot;intent-based&quot; coding—where the outcome is prioritized over the execution details—tools that bridge the gap between local ideas and global scale will likely define the next era of computing. </p>]]></description>
            <author>carl.franzen@venturebeat.com (Carl Franzen)</author>
            <category>Infrastructure</category>
            <category>Infrastructure</category>
            <enclosure url="https://images.ctfassets.net/jdtwqhzvc2n1/MHYoJfMiFcReiUHztmcXO/cd5bfd956110f341d2e205f020a78097/ChatGPT_Image_Apr_30__2026__02_28_07_PM.png?w=300&amp;q=30" length="0" type="image/png"/>
        </item>
    </channel>
</rss>