On a recent transatlantic flight, Mark Ruddock, an entrepreneur in residence at GALLOS Technologies, decided to put his team of AI agents to work. He was 34,000 feet over the Atlantic with a high-stakes product demo for a key client in less than 48 hours, and his software platform wasn't ready.

By the time his flight crossed Iceland, he recounted in an interview with VentureBeat, his "Claude Code swarm" had built over 50 React components, a mock API set for three enterprise integrations and a full admin interface. What would typically take a human team 18 developer-days was compressed into a six-hour flight. The output was more than a prototype; it was a fully documented, tested and secured application framework, complete with production-ready Docker configs and a CI/CD pipeline.

“I will never build a software company the same way again, I promise you,” Ruddock told me in an interview yesterday.

Ruddock’s experience, which he first posted about on LinkedIn last week, isn’t isolated. It’s a snapshot of a significant boost in AI capabilities that occurred this summer. In just a few months, the ability of AI to perform complex software engineering has accelerated at a dizzying, non-linear pace. The improvement has been documented in several ways. Driven by several major advances in technology and practice, it amounts to a fundamental change in how software is created, one that is already making last year’s paradigms obsolete.

The era of "vibe coding" – the conversational, often exploratory practice of prompting an AI for code, coined by Andrej Karpathy – has given way to a more disciplined concept: agentic swarm coding

(To be sure, vibe-coding may not be dead for the vast majority of developers who are trying to build something on the fly, but we're talking about for serious enterprise application developers, which is our target audience for this piece.)

The summer AGI got real for developers

“Even Karpathy’s vibe coding term is legacy now. It’s outdated,” Val Bercovici, chief AI officer of WEKA, told me in a recent conversation. “It’s been superseded by this concept of agentic swarm coding, where multiple agents in coordination are delivering… very functional MVPs and version one apps.” And this comes from Bercovici, who carries some weight: He’s a long-time infrastructure veteran who served as a CTO at NetApp and was a founding board member of the Cloud Native Compute Foundation (CNCF), which stewards Kubernetes.

The idea of swarms isn't entirely new — OpenAI's own agent SDK was originally called Swarm when it was first released as an experimental framework last year. But the capability of these swarms reached an inflection point this summer.

Bercovici, a self-proclaimed former skeptic of AGI, said he’s now a believer that AGI is coming, having witnessed the code advances this summer, along with his own team’s transformation. He described how even his most cynical engineer – a man nicknamed the "Prince of Darkness" – has been converted by the sheer quality and speed of what today’s agentic systems can produce.

For developers who have honed their craft for decades, this stuff feels almost like science fiction. “Experienced software developers… are seeing our entire craft of 30, 40 years, fundamentally change in the span of a couple of months,” Bercovici said.

Deconstructing the leap: How agent swarms work

The convergence of several distinct technological advancements are driving this sudden leap: a new generation of foundation models, a maturation of agentic architectures, and a rapid evolution in the skills of the humans guiding them.

Based on my conversations with Ruddock and others at the forefront, the acceleration is driven by these three core pillars.

1. Smarter Foundation Models: The raw intelligence of the underlying models from OpenAI (GPT-5), Anthropic (Claude 4 series) and xAI (Grok 4) has taken a significant step up this summer. On the industry-standard SWE-bench, which tests an AI’s ability to resolve actual GitHub issues, new models released this summer have shattered previous records. OpenAI’s GPT-5, for instance, now achieves a 74.9% success rate for these tasks, which require deep context, planning and debugging.  That compares to the previous OpenAI high score of 58.4 percent, set by its o3 model in July. Claude Opus 4.1 hit 74.5 percent in August, up from Claude Opus’ 72.4 percent in May.

2. Sophisticated Agentic Architectures: More important than any single model, though, is how they are orchestrated. The "swarm" is an architecture where a problem is decomposed and assigned to multiple specialized agents.  A paper by METR, an organization that studies advanced AI systems, reported in March that "the length of tasks AI can do is doubling every 7 months." However, Amjad Masad, CEO of Replit, tweeted Wednesday that this “radically undersells” the 10x scaling that Replit’s software agent saw in the period, aided by orchestration, including multi-agent architecture (see chart at top, and below). For context, Replit is the same agentic solution used by Mark Ruddock of GALLOS.

 Coding capabilities of base models are accelerating. Source: METR

Coding capabilities of base models are accelerating. Source: METR

This new generation of agentic systems has mastered several key structural capabilities:

  • Replanning: Agents can now dynamically edit their own task lists, allowing them to adapt when a step fails or a better path emerges. This capability is a key reason agent frameworks like Warp have achieved top SWE-bench scores of 75.8 percent this month, when run with a custom orchestration setup.

  • Multi-agent Specialization: Instead of one AI trying to do everything, agentic swarms assign roles. A "planner" agent breaks down the task, "coder" agents write the code, and a "critic" agent reviews the work. This mirrors a human software team and is the principle behind frameworks like Claude Flow, developed by Toronto-based Reuven Cohen. Bercovici described it as a system where "tens of instances of Claude code in parallel are being orchestrated to work on specifications, documentation... the full CICD DevOps life cycle." This is the engine behind the agentic swarm, condensing a month of teamwork into a single hour.

  • Model Switching: Advanced systems can intelligently route sub-tasks to the best model for the job—perhaps using Claude for high-level reasoning, GPT-5 for raw code synthesis, and Grok 4 for rapid iteration.

  • Real Tool Integration: This is perhaps the biggest game-changer. Agents are no longer just writing code in a vacuum. They are now grounded in the developer's actual environment, using essential tools like grep to search the codebase, pytest to run tests, build tools like make or pip to compile and set up projects, and git diff to manage changes. This closes the loop between code generation and real-world validation.

  • Sustained Autonomous Operation: Early agents would get lost or run out of context on complex tasks. As Y Combinator founder Paul Graham tweeted Wednesday, a key test for AI is how long it can "continue thinking about something productively." This summer, that duration has exploded. Replit’s Agent 3, for example, can now run autonomously for up to 200 minutes to complete a task – a dramatic increase from the 20-minute runs of its Agent 2 predecessor in February.

3. The Rise of the ‘Agentic Engineer’: The final piece is the human. The most effective practitioners are not passive prompters; they are what Ruddock calls "agentic engineers." They provide the scaffolding, discipline, and rigorous oversight that turns AI-generated “slop” into enterprise-grade software.

Ruddock’s process, for example, involves having agents write a detailed Product Requirements Document (PRD) first. He then uses a second agent with a skeptical "persona" to review the first agent's code. Finally, he conducts his own review. "You have to be super intentional about this," he explained. "I'm much better at this now, because I know how to ask, what to ask for, how to give it the guardrails to sanity check its own work."

From prototype to production: Engineering for the enterprise

The most significant evidence of this summer’s leap is that the output of agent swarms has gone way beyond the prototype, which until recently was seen as a ceiling for agentic coding. The agents are building the foundations of production-ready applications. This refutes the common criticism that AI-generated code is "AI slop" unsuitable for real-world deployment.

Ruddock is emphatic on this point. The application built on his flight was "Docker capable, Kubernetes capable, runs all the security checks you would expect... all by the time I landed." This wasn't an accident; it was by design. He starts his projects from a "canonical template" in GitHub that already includes workflows for security scans and code quality checks.

This is where agent specialization becomes critical for enterprise needs. Ruddock assigns specific personas to his agents to enforce discipline. For example, he instructs one agent to act as a "15-year security veteran" with deep experience in analyzing code for flaws. This specialized agent is tasked with reviewing the work of the primary coding agents, creating a system of checks and balances that mimics a senior engineering review.

Bercovici’s experience mirrors this. He notes that agentic swarms are now delivering applications complete with "security audits, red teaming, compliance documentation and enterprise authentication" – all the components that separate a demo from a deployable product. The swarm is automating the entire, rigorous CI/CD and DevOps lifecycle, he said.

This shift is profound. The conversation has moved on from whether AI can write a function or vibe-code a prototype, to whether an AI agent team can build, test, secure and deploy an entire application. The answer, increasingly, is yes.

With a huge caveat.

The reality check: Good days and bad days

This new paradigm still comes with challenges. The performance of agent swarms can be inconsistent. “There are days that brilliant agents show up that I'm amazed and audit their work,” Ruddock admitted. “And there are days that agents show up like… a freaking shit head.” He said he never knows what sort of agent he is going to get. To circumvent this, he has the agents spin up several versions of an identical product at once, in parallel, so that he can pick the agent that performs the best.

The bottom line, though, is that the cognitive overhead of managing these systems is immense. The bottleneck is shifting from the speed of writing code to the speed of verifying it. This bogs down humans and frustrates them. A recent study from Metr.org found that AI tools could actually slow down experienced developers on complex tasks, because the time spent on meticulous review and debugging negated the initial gains from code generation.  

Contrary to the "vibe coding" ethos of letting an AI work without deep involvement, seasoned engineers want more control, not less. They are increasingly open to using LLMs, but they resist the idea of a system that builds everything without a tight, conversational feedback loop. The real challenge for agentic platforms is to provide powerful automation without sacrificing the developer's ability to intervene, question and steer the process.

This underscores another key point: agentic swarming is less about replacing developers and more about augmenting the most skilled ones, transforming their role from hands-on-keyboard coder to a high-level architect and validator of an AI-powered engineering team.

The hype cycle outpaces reality. Six months ago, Anthropic's CEO predicted that AI would be "writing 90 percent of code" by now -- a milestone we are clearly nowhere near. While progress is undeniably exponential, as the METR and Replit charts show, the path to truly autonomous, reliable software creation is still fraught with complexity.

At a recent DeepMind event, someone asked Google DeepMind CEO Jeff Dean when we could trust software written by LLMs enough to operate an airplane. After a long pause, Dean reportedly replied, "Are there humans on the plane?" His response, while humorous, underscores the verification challenges that remain. Dean then went on to say that while care is needed during implementation, technology is moving at such a pace that not far out, he does expect the majority of software to be written by LLMs.

A new moat for the enterprise

The extreme acceleration this summer has permanently altered the landscape of software development. The democratization of agentic workflows, accelerated by user-facing code execution tools like Claude’s Code Interpreter, as noted by expert Simon Willison, means that the barrier to creating complex, deployed software is collapsing.

This creates a new competitive reality for the enterprise. As Ruddock put it, the modern moat for a software company is about having a "unique perspective on a problem domain" and the ability to "execute with an unbelievable velocity." It’s less about the software, which can be built in days, if not hours.

For enterprise leaders and technical decision-makers, the summer of 2025 will be remembered as the moment the starting gun fired on a new race in coding apps – one that will be won by those who can most effectively orchestrate agentic intelligence.