Automation | VentureBeat

Salesforce rolls out new Slackbot AI agent as it battles Microsoft and Google in workplace AI

michael.nunez@venturebeat.com (Michael Nuñez) — Tue, 13 Jan 2026 13:00:00 GMT

Salesforce on Tuesday launched an entirely rebuilt version of Slackbot, the company's workplace assistant, transforming it from a simple notification tool into what executives describe as a fully powered AI agent capable of searching enterprise data, drafting documents, and taking action on behalf of employees.

The new Slackbot, now generally available to Business+ and Enterprise+ customers, is Salesforce's most aggressive move yet to position Slack at the center of the emerging "agentic AI" movement — where software agents work alongside humans to complete complex tasks. The launch comes as Salesforce attempts to convince investors that artificial intelligence will bolster its products rather than render them obsolete.

"Slackbot isn't just another copilot or AI assistant," said Parker Harris, Salesforce co-founder and Slack's chief technology officer, in an exclusive interview with Salesforce. "It's the front door to the agentic enterprise, powered by Salesforce."

From tricycle to Porsche: Salesforce rebuilt Slackbot from the ground up

Harris was blunt about what distinguishes the new Slackbot from its predecessor: "The old Slackbot was, you know, a little tricycle, and the new Slackbot is like, you know, a Porsche."

The original Slackbot, which has existed since Slack's early days, performed basic algorithmic tasks — reminding users to add colleagues to documents, suggesting channel archives, and delivering simple notifications. The new version runs on an entirely different architecture built around a large language model and sophisticated search capabilities that can access Salesforce records, Google Drive files, calendar data, and years of Slack conversations.

"It's two different things," Harris explained. "The old Slackbot was algorithmic and fairly simple. The new Slackbot is brand new — it's based around an LLM and a very robust search engine, and connections to third-party search engines, third-party enterprise data."

Salesforce chose to retain the Slackbot brand despite the fundamental technical overhaul. "People know what Slackbot is, and so we wanted to carry that forward," Harris said.

Why Anthropic's Claude powers the new Slackbot — and which AI models could come next

The new Slackbot runs on Claude, Anthropic's large language model, a choice driven partly by compliance requirements. Slack's commercial service operates under FedRAMP Moderate certification to serve U.S. federal government customers, and Harris said Anthropic was "the only provider that could give us a compliant LLM" when Slack began building the new system.

But that exclusivity won't last. "We are, this year, going to support additional providers," Harris said. "We have a great relationship with Google. Gemini is incredible — performance is great, cost is great. So we're going to use Gemini for some things." He added that OpenAI remains a possibility as well.

Harris echoed Salesforce CEO Marc Benioff's view that large language models are becoming commoditized: "You've heard Marc talk about LLMs are commodities, that they're democratized. I call them CPUs."

On the sensitive question of training data, Harris was unequivocal: Salesforce does not train any models on customer data. "Models don't have any sort of security," he explained. "If we trained it on some confidential conversation that you and I have, I don't want Carolyn to know — if I train it into the LLM, there is no way for me to say you get to see the answer, but Carolyn doesn't."

Inside Salesforce's internal experiment: 80,000 employees tested Slackbot with striking results

Salesforce has been testing the new Slackbot internally for months, rolling it out to all 80,000 employees. According to Ryan Gavin, Slack's chief marketing officer, the results have been striking: "It's the fastest adopted product in Salesforce history."

Internal data shows that two-thirds of Salesforce employees have tried the new Slackbot, with 80% of those users continuing to use it regularly. Internal satisfaction rates reached 96% — the highest for any AI feature Slack has shipped. Employees report saving between two and 20 hours per week.

The adoption happened largely organically. "I think it was about five days, and a Canvas was developed by our employees called 'The Most Stealable Slackbot Prompts,'" Gavin said. "People just started adding to it organically. I think it's up to 250-plus prompts that are in this Canvas right now."

Kate Crotty, a principal UX researcher at Salesforce, found that 73% of internal adoption was driven by social sharing rather than top-down mandates. "Everybody is there to help each other learn and communicate hacks," she said.

How Slackbot transforms scattered enterprise data into executive-ready insights

During a product demonstration, Amy Bauer, Slack's product experience designer, showed how Slackbot can synthesize information across multiple sources. In one example, she asked Slackbot to analyze customer feedback from a pilot program, upload an image of a usage dashboard, and have Slackbot correlate the qualitative and quantitative data.

"This is where Slackbot really earns its keep for me," Bauer explained. "What it's doing is not just simply reading the image — it's actually looking at the image and comparing it to the insight it just generated for me."

Slackbot can then query Salesforce to find enterprise accounts with open deals that might be good candidates for early access, creating what Bauer called "a really great justification and plan to move forward." Finally, it can synthesize all that information into a Canvas — Slack's collaborative document format — and find calendar availability among stakeholders to schedule a review meeting.

"Up until this point, we have been working in a one-to-one capacity with Slackbot," Bauer said. "But one of the benefits that I can do now is take this insight and have it generate this into a Canvas, a shared workspace where I can iterate on it, refine it with Slackbot, or share it out with my team."

Rob Seaman, Slack's chief product officer, said the Canvas creation demonstrates where the product is heading: "This is making a tool call internally to Slack Canvas to actually write, effectively, a shared document. But it signals where we're going with Slackbot — we're eventually going to be adding in additional third-party tool calls."

MrBeast's company became a Slackbot guinea pig—and employees say they're saving 90 minutes a day

Among Salesforce's pilot customers is Beast Industries, the parent company of YouTube star MrBeast. Luis Madrigal, the company's chief information officer, joined the launch announcement to describe his experience.

"As somebody who has rolled out enterprise technologies for over two decades now, this was practically one of the easiest," Madrigal said. "The plumbing is there. Slack as an implementation, Enterprise Tools — being able to turn on the Slackbot and the Slack AI functionality was as simple as having my team go in, review, do a quick security review."

Madrigal said his security team signed off "rather quickly" — unusual for enterprise AI deployments — because Slackbot accesses only the information each individual user already has permission to view. "Given all the guardrails you guys have put into place for Slackbot to be unique and customized to only the information that each individual user has, only the conversations and the Slack rooms and Slack channels that they're part of—that made my security team sign off rather quickly."

One Beast Industries employee, Sinan, the head of Beast Games marketing, reported saving "at bare minimum, 90 minutes a day." Another employee, Spencer, a creative supervisor, described it as "an assistant who's paying attention when I'm not."

Other pilot customers include Slalom, reMarkable, Xero, Mercari, and Engine. Mollie Bodensteiner, SVP of Operations at Engine, called Slackbot "an absolute 'chaos tamer' for our team," estimating it saves her about 30 minutes daily "just by eliminating context switching."

Slackbot vs. Microsoft Copilot vs. Google Gemini: The fight for enterprise AI dominance

The launch puts Salesforce in direct competition with Microsoft's Copilot, which is integrated into Teams and the broader Microsoft 365 suite, as well as Google's Gemini integrations across Workspace. When asked what distinguishes Slackbot from these alternatives, Seaman pointed to context and convenience.

"The thing that makes it most powerful for our customers and users is the proximity — it's just right there in your Slack," Seaman said. "There's a tremendous convenience affordance that's naturally built into it."

The deeper advantage, executives argue, is that Slackbot already understands users' work without requiring setup or training. "Most AI tools sound the same no matter who is using them," the company's announcement stated. "They lack context, miss nuance, and force you to jump between tools to get anything done."

Harris put it more directly: "If you've ever had that magic experience with AI — I think ChatGPT is a great example, it's a great experience from a consumer perspective — Slackbot is really what we're doing in the enterprise, to be this employee super agent that is loved, just like people love using Slack."

Amy Bauer emphasized the frictionless nature of the experience. "Slackbot is inherently grounded in the context, in the data that you have in Slack," she said. "So as you continue working in Slack, Slackbot gets better because it's grounded in the work that you're doing there. There is no setup. There is no configuration for those end users."

Salesforce's ambitious plan to make Slackbot the one 'super agent' that controls all the others

Salesforce positions Slackbot as what Harris calls a "super agent" — a central hub that can eventually coordinate with other AI agents across an organization.

"Every corporation is going to have an employee super agent," Harris said. "Slackbot is essentially taking the magic of what Slack does. We think that Slackbot, and we're really excited about it, is going to be that."

The vision extends to third-party agents already launching in Slack. Last month, Anthropic released a preview of Claude Code for Slack, allowing developers to interact with Claude's coding capabilities directly in chat threads. OpenAI, Google, Vercel, and others have also built agents for the platform.

"Most of the net-new apps that are being deployed to Slack are agents," Seaman noted during the press conference. "This is proof of the promise of humans and agents coexisting and working together in Slack to solve problems."

Harris described a future where Slackbot becomes an MCP (Model Context Protocol) client, able to leverage tools from across the software ecosystem — similar to how the developer tool Cursor works. "Slack can be an MCP client, and Slackbot will be the hub of that, leveraging all these tools out in the world, some of which will be these amazing agents," he said.

But Harris also cautioned against over-promising on multi-agent coordination. "I still think we're in the single agent world," he said. "FY26 is going to be the year where we started to see more coordination. But we're going to do it with customer success in mind, and not demonstrate and talk about, like, 'I've got 1,000 agents working together,' because I think that's unrealistic."

Slackbot costs nothing extra, but Salesforce's data access fees could squeeze some customers

Slackbot is included at no additional cost for customers on Business+ and Enterprise+ plans. "There's no additional fees customers have to do," Gavin confirmed. "If they're on one of those plans, they're going to get Slackbot."

However, some enterprise customers may face other cost pressures related to Salesforce's broader data strategy. CIOs may see price increases for third-party applications that work with Salesforce data, as effects of higher charges for API access ripple through the software supply chain.

Fivetran CEO George Fraser has warned that Salesforce's shift in pricing policy for API access could have tangible consequences for enterprises relying on Salesforce as a system of record. "They might not be able to use Fivetran to replicate their data to Snowflake and instead have to use Salesforce Data Cloud. Or they might find that they are not able to interact with their data via ChatGPT, and instead have to use Agentforce," Fraser said in a recent CIO report.

Salesforce has framed the pricing change as standard industry practice.

What Slackbot can do today, what's coming in weeks, and what's still on the roadmap

The new Slackbot begins rolling out today and will reach all eligible customers by the end of February. Mobile availability will complete by March 3, Bauer confirmed during her interview with VentureBeat.

Some capabilities remain works in progress. Calendar reading and availability checking are available at launch, but the ability to actually book meetings is "coming a few weeks after," according to Seaman. Image generation is not currently supported, though Bauer said it's "something that we are looking at in the future."

When asked about integration with competing CRM systems like HubSpot and Microsoft Dynamics, Salesforce representatives declined to provide specifics during the interview, though they acknowledged the question touched on key competitive differentiators.

Salesforce is betting the future of work looks like a chat window—and it's not alone

The Slackbot launch is Salesforce's bet that the future of enterprise work is conversational — that employees will increasingly prefer to interact with AI through natural language rather than navigating traditional software interfaces.

Harris described Slack's product philosophy using principles like "don't make me think" and "be a great host." The goal, he said, is for Slackbot to surface information proactively rather than requiring users to hunt for it.

"One of the revelations for me is LLMs applied to unstructured information are incredible," Harris said. "And the amount of value you have if you're a Slack user, if your corporation uses Slack — the amount of value in Slack is unbelievable. Because you're talking about work, you're sharing documents, you're making decisions, but you can't as a human go through that and really get the same value that an LLM can do."

Looking ahead, Harris expects the interfaces themselves to evolve beyond pure conversation. "We're kind of saturating what we can do with purely conversational UIs," he said. "I think we'll start to see agents building an interface that best suits your intent, as opposed to trying to surface something within a conversational interface that matches your intent."

Microsoft, Google, and a growing roster of AI startups are placing similar bets — that the winning enterprise AI will be the one embedded in the tools workers already use, not another application to learn. The race to become that invisible layer of workplace intelligence is now fully underway.

For Salesforce, the stakes extend beyond a single product launch. After a bruising year on Wall Street and persistent questions about whether AI threatens its core business, the company is wagering that Slackbot can prove the opposite — that the tens of millions of people already chatting in Slack every day is not a vulnerability, but an unassailable advantage.

Haley Gault, the Salesforce account executive in Pittsburgh who stumbled upon the new Slackbot on a snowy morning, captured the shift in a single sentence: "I honestly can't imagine working for another company not having access to these types of tools. This is just how I work now."

That's precisely what Salesforce is counting on.

Anthropic launches Cowork, a Claude Desktop agent that works in your files — no coding required

michael.nunez@venturebeat.com (Michael Nuñez) — Mon, 12 Jan 2026 11:30:00 GMT

Anthropic released Cowork on Monday, a new AI agent capability that extends the power of its wildly successful Claude Code tool to non-technical users — and according to company insiders, the team built the entire feature in approximately a week and a half, largely using Claude Code itself.

The launch marks a major inflection point in the race to deliver practical AI agents to mainstream users, positioning Anthropic to compete not just with OpenAI and Google in conversational AI, but with Microsoft's Copilot in the burgeoning market for AI-powered productivity tools.

"Cowork lets you complete non-technical tasks much like how developers use Claude Code," the company announced via its official Claude account on X. The feature arrives as a research preview available exclusively to Claude Max subscribers — Anthropic's power-user tier priced between $100 and $200 per month — through the macOS desktop application.

For the past year, the industry narrative has focused on large language models that can write poetry or debug code. With Cowork, Anthropic is betting that the real enterprise value lies in an AI that can open a folder, read a messy pile of receipts, and generate a structured expense report without human hand-holding.

How developers using a coding tool for vacation research inspired Anthropic's latest product

The genesis of Cowork lies in Anthropic's recent success with the developer community. In late 2024, the company released Claude Code, a terminal-based tool that allowed software engineers to automate rote programming tasks. The tool was a hit, but Anthropic noticed a peculiar trend: users were forcing the coding tool to perform non-coding labor.

According to Boris Cherny, an engineer at Anthropic, the company observed users deploying the developer tool for an unexpectedly diverse array of tasks.

"Since we launched Claude Code, we saw people using it for all sorts of non-coding work: doing vacation research, building slide decks, cleaning up your email, cancelling subscriptions, recovering wedding photos from a hard drive, monitoring plant growth, controlling your oven," Cherny wrote on X. "These use cases are diverse and surprising — the reason is that the underlying Claude Agent is the best agent, and Opus 4.5 is the best model."

Recognizing this shadow usage, Anthropic effectively stripped the command-line complexity from their developer tool to create a consumer-friendly interface. In its blog post announcing the feature, Anthropic explained that developers "quickly began using it for almost everything else," which "prompted us to build Cowork: a simpler way for anyone — not just developers — to work with Claude in the very same way."

Inside the folder-based architecture that lets Claude read, edit, and create files on your computer

Unlike a standard chat interface where a user pastes text for analysis, Cowork requires a different level of trust and access. Users designate a specific folder on their local machine that Claude can access. Within that sandbox, the AI agent can read existing files, modify them, or create entirely new ones.

Anthropic offers several illustrative examples: reorganizing a cluttered downloads folder by sorting and intelligently renaming each file, generating a spreadsheet of expenses from a collection of receipt screenshots, or drafting a report from scattered notes across multiple documents.

"In Cowork, you give Claude access to a folder on your computer. Claude can then read, edit, or create files in that folder," the company explained on X. "Try it to create a spreadsheet from a pile of screenshots, or produce a first draft from scattered notes."

The architecture relies on what is known as an "agentic loop." When a user assigns a task, the AI does not merely generate a text response. Instead, it formulates a plan, executes steps in parallel, checks its own work, and asks for clarification if it hits a roadblock. Users can queue multiple tasks and let Claude process them simultaneously — a workflow Anthropic describes as feeling "much less like a back-and-forth and much more like leaving messages for a coworker."

The system is built on Anthropic's Claude Agent SDK, meaning it shares the same underlying architecture as Claude Code. Anthropic notes that Cowork "can take on many of the same tasks that Claude Code can handle, but in a more approachable form for non-coding tasks."

The recursive loop where AI builds AI: Claude Code reportedly wrote much of Claude Cowork

Perhaps the most remarkable detail surrounding Cowork's launch is the speed at which the tool was reportedly built — highlighting a recursive feedback loop where AI tools are being used to build better AI tools.

During a livestream hosted by Dan Shipper, Felix Rieseberg, an Anthropic employee, confirmed that the team built Cowork in approximately a week and a half.

Alex Volkov, who covers AI developments, expressed surprise at the timeline: "Holy shit Anthropic built 'Cowork' in the last... week and a half?!"

This prompted immediate speculation about how much of Cowork was itself built by Claude Code. Simon Smith, EVP of Generative AI at Klick Health, put it bluntly on X: "Claude Code wrote all of Claude Cowork. Can we all agree that we're in at least somewhat of a recursive improvement loop here?"

The implication is profound: Anthropic's AI coding agent may have substantially contributed to building its own non-technical sibling product. If true, this is one of the most visible examples yet of AI systems being used to accelerate their own development and expansion — a strategy that could widen the gap between AI labs that successfully deploy their own agents internally and those that do not.

Connectors, browser automation, and skills extend Cowork's reach beyond the local file system

Cowork doesn't operate in isolation. The feature integrates with Anthropic's existing ecosystem of connectors — tools that link Claude to external information sources and services such as Asana, Notion, PayPal, and other supported partners. Users who have configured these connections in the standard Claude interface can leverage them within Cowork sessions.

Additionally, Cowork can pair with Claude in Chrome, Anthropic's browser extension, to execute tasks requiring web access. This combination allows the agent to navigate websites, click buttons, fill forms, and extract information from the internet — all while operating from the desktop application.

"Cowork includes a number of novel UX and safety features that we think make the product really special," Cherny explained, highlighting "a built-in VM [virtual machine] for isolation, out of the box support for browser automation, support for all your claude.ai data connectors, asking you for clarification when it's unsure."

Anthropic has also introduced an initial set of "skills" specifically designed for Cowork that enhance Claude's ability to create documents, presentations, and other files. These build on the Skills for Claude framework the company announced in October, which provides specialized instruction sets Claude can load for particular types of tasks.

Why Anthropic is warning users that its own AI agent could delete their files

The transition from a chatbot that suggests edits to an agent that makes edits introduces significant risk. An AI that can organize files can, theoretically, delete them.

In a notable display of transparency, Anthropic devoted considerable space in its announcement to warning users about Cowork's potential dangers — an unusual approach for a product launch.

The company explicitly acknowledges that Claude "can take potentially destructive actions (such as deleting local files) if it's instructed to." Because Claude might occasionally misinterpret instructions, Anthropic urges users to provide "very clear guidance" about sensitive operations.

More concerning is the risk of prompt injection attacks — a technique where malicious actors embed hidden instructions in content Claude might encounter online, potentially causing the agent to bypass safeguards or take harmful actions.

"We've built sophisticated defenses against prompt injections," Anthropic wrote, "but agent safety — that is, the task of securing Claude's real-world actions — is still an active area of development in the industry."

The company characterized these risks as inherent to the current state of AI agent technology rather than unique to Cowork. "These risks aren't new with Cowork, but it might be the first time you're using a more advanced tool that moves beyond a simple conversation," the announcement notes.

Anthropic's desktop agent strategy sets up a direct challenge to Microsoft Copilot

The launch of Cowork places Anthropic in direct competition with Microsoft, which has spent years attempting to integrate its Copilot AI into the fabric of the Windows operating system with mixed adoption results.

However, Anthropic's approach differs in its isolation. By confining the agent to specific folders and requiring explicit connectors, they are attempting to strike a balance between the utility of an OS-level agent and the security of a sandboxed application.

What distinguishes Anthropic's approach is its bottom-up evolution. Rather than designing an AI assistant and retrofitting agent capabilities, Anthropic built a powerful coding agent first — Claude Code — and is now abstracting its capabilities for broader audiences. This technical lineage may give Cowork more robust agentic behavior from the start.

Claude Code has generated significant enthusiasm among developers since its initial launch as a command-line tool in late 2024. The company expanded access with a web interface in October 2025, followed by a Slack integration in December. Cowork is the next logical step: bringing the same agentic architecture to users who may never touch a terminal.

Who can access Cowork now, and what's coming next for Windows and other platforms

For now, Cowork remains exclusive to Claude Max subscribers using the macOS desktop application. Users on other subscription tiers — Free, Pro, Team, or Enterprise — can join a waitlist for future access.

Anthropic has signaled clear intentions to expand the feature's reach. The blog post explicitly mentions plans to add cross-device sync and bring Cowork to Windows as the company learns from the research preview.

Cherny set expectations appropriately, describing the product as "early and raw, similar to what Claude Code felt like when it first launched."

To access Cowork, Max subscribers can download or update the Claude macOS app and click on "Cowork" in the sidebar.

The real question facing enterprise AI adoption

For technical decision-makers, the implications of Cowork extend beyond any single product launch. The bottleneck for AI adoption is shifting — no longer is model intelligence the limiting factor, but rather workflow integration and user trust.

Anthropic's goal, as the company puts it, is to make working with Claude feel less like operating a tool and more like delegating to a colleague. Whether mainstream users are ready to hand over folder access to an AI that might misinterpret their instructions remains an open question.

But the speed of Cowork's development — a major feature built in ten days, possibly by the company's own AI — previews a future where the capabilities of these systems compound faster than organizations can evaluate them.

The chatbot has learned to use a file manager. What it learns to use next is anyone's guess.

The $5 million lesson: Why accessibility should be part of your risk plan

Thu, 20 Nov 2025 05:00:00 GMT

Presented by AudioEye

In 2020, a blind customer named Juan Alcazar filed a lawsuit against Fashion Nova, alleging that the company’s website was inaccessible and denied blind customers the same access as everyone else.

It was, in many ways, an ordinary web accessibility lawsuit. One of many filed in federal court that year. Most ended the same way: management distraction, a pledge to fix accessibility issues, legal fees and then a five-figure settlement.

But this case didn’t settle. Fashion Nova fought it.

Five years and more than 200 filings later, they agreed to pay $5.15 million to settle what had become a class action lawsuit. The claim evolved from a single complaint into the second-largest accessibility settlement on record, surpassed only by Target’s $6 million agreement in 2008.

It’s a stark reminder of how quickly an accessibility claim can escalate, and why every business leader should treat that risk as real, urgent, and solvable.

Accessibility lawsuits are increasing. So is risk.

Since 2020, the number of web accessibility lawsuits has steadily risen. In 2024, over 4,000 lawsuits were filed in the United States. And those are just the cases that reach court. Behind the scenes, demand letters are even more prevalent, with multiple sources, including Accessibility.com, estimating that over 250,000 letters are sent to businesses annually.

Often, these suits hinge on common accessibility issues, such as missing alt text or unlabeled forms. Under laws like the Americans with Disabilities Act (ADA) and California’s Unruh Civil Rights Act, plaintiffs don’t need to prove intent or significant harm. Simply encountering a barrier is enough.

Accessibility compliance isn’t just a U.S. issue, either. The European Accessibility Act (EAA) took effect in July 2025, expanding accessibility obligations to any business offering digital products or services in the EU, including companies located outside the EU, impacting global brands.

And while many assume that only large companies are subject to legal risk, in 2024, nearly three-quarters of web accessibility lawsuits targeted small and mid-sized businesses.

For billion-dollar brands like Fashion Nova, there’s always the option to dig in and fight an accessibility claim. However, for smaller businesses, the risk of a protracted legal battle and a substantial settlement can be too high. Settling quickly is often perceived as the ‘safe’ choice — a fact that firms that specialize in serial litigation are all too aware of.

Make accessibility a first line of defense

Accessibility lawsuits have a pattern: once a company is sued, it’s much more likely to be sued again. Accessibility.com also reports that in 2024, 48% of defendants had previously been sued for web accessibility barriers.

That’s why smart risk management isn’t just about having a plan to respond. It’s about lowering the chances of being on someone’s radar in the first place. And that starts with a process for finding and fixing accessibility issues before they’re brought to attention in a demand letter or legal claim:

Establish a baseline of website accessibility by using automated scans in conjunction with human reviews.
Prioritize high-severity barriers, or those most likely to spark a claim, and address them promptly.
Continuously monitor websites to find and fix issues before they become liabilities.
Document progress to prove ongoing improvement if questioned.
Integrate accessibility into everyday workflows, making accessibility part of organizational culture.

A scalable approach to accessibility compliance

AudioEye’s 2025 Digital Accessibility Index analyzed over 15,000 websites across different industries. The average site had 297 accessibility issues per page — including serious problems like unlabeled buttons or broken form fields.

Fixing a few accessibility issues might sound manageable. But when there are dozens of web pages, each with hundreds of unique issues, it quickly becomes an operational nightmare.

The biggest challenge is serving customers while addressing countless accessibility issues. Relying on developers to find and fix hundreds of issues per page is challenging enough. Still, teams also have to contend with the fact that each site update presents a chance to introduce new barriers inadvertently. Automation helps, but no automated tool can find and fix everything (no matter what some accessibility companies claim in their marketing campaigns).

The most effective approach blends automation and human expertise to:

Scale detection and prevention: Let automation do the heavy lifting, scanning sites in real time and fixing new issues as they appear.
Tackle complex fixes: Rely on experts to test key pages and fix the high-risk issues that automation can’t handle on its own.

With these pieces in place, organizations can prevent serious accessibility issues from accumulating, while making sites much safer from legal claims. That’s risk management in action: not scrambling after the fact, but planning ahead with the right tools, expertise, and processes in place.

The cost of doing nothing

Even the smallest web accessibility claim can disrupt businesses. Settlements may range from a few thousand dollars to six figures. Still, the actual cost extends beyond this: legal fees, time spent resolving issues, and executive focus being diverted from other priorities.

And then there’s the bigger risk: escalation. Alcazar v. Fashion Nova, Inc. won’t be the last large accessibility settlement. However, it serves as a clear reminder of what can happen when an accessibility claim snowballs into something much larger.

The companies that manage risk best aren’t perfect. But they know where they stand. They can show progress. And when a claim comes in, they know how to respond.

Accessibility may not feel like a risk today. But if organizations wait until it does, it’s already too late.

Treat it like what it is: a business risk worth managing — before it becomes a business risk that can’t be ignored.

David Moradi is CEO of Audioeye.

Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Writer's AI agents can actually do your work—not just chat about it

michael.nunez@venturebeat.com (Michael Nuñez) — Tue, 18 Nov 2025 16:00:00 GMT

Writer, a San Francisco-based artificial intelligence startup, is launching a unified AI agent platform designed to let any employee automate complex business workflows without writing code — a capability the company says distinguishes it from consumer-oriented tools like Microsoft Copilot and ChatGPT.

The platform, called Writer Agent, combines chat-based assistance with autonomous task execution in a single interface. Starting Tuesday, enterprise customers can use natural language to instruct the AI to create presentations, analyze financial data, generate marketing campaigns, or coordinate across multiple business systems like Salesforce, Slack, and Google Workspace—then save those workflows as reusable "Playbooks" that run automatically on schedules.

The announcement comes as enterprises struggle to move AI initiatives beyond pilot programs into production at scale. Writer CEO May Habib has been outspoken about this challenge, recently revealing that 42% of Fortune 500 executives surveyed by her company said AI is "tearing their company apart" due to coordination failures between departments.

"We're delivering an agent interface that is both incredibly powerful and radically simple to transform individual productivity into organizational impact," Habib said in a statement. "Writer Agent is the difference between a single sales rep asking a chatbot to write an outreach email and an enterprise ensuring that 1,000 reps are all sending on-brand, compliant, and contextually-aware messages to target accounts."

How Writer is putting workflow automation in the hands of non-technical workers

The platform's core innovation centers on making workflow automation accessible to non-technical employees—what Writer executives call "democratizing who gets to be a builder."

In an exclusive interview with VentureBeat, Doris Jwo, Writer's director of product management, demonstrated how the system works: A user types a request in plain English — for example, "Create a two-page partnership proposal between [Company A] and [Company B], make it a branded deck, include impact metrics and partnership tiers."

The AI agent then breaks down that request into discrete steps, conducts web research, generates graphics and charts on the fly, creates individual slides with sourced information, and assembles a complete presentation. The entire process, which might take an employee hours or days, can be completed in 10-12 minutes.

"The agent basically looks at the request, breaks it down, does research, understands what pieces it needs, creates a detailed plan at a step-by-step level," Jwo explained during a product demonstration. "It might say, 'I need to do web research,' or 'This user needs information from Gong or Slack,' and it reaches out to those connectors, grabs the data, and executes the plan."

Crucially, users can save these multi-step processes as Playbooks—reusable templates that colleagues can deploy with a single click. Routines allow those Playbooks to run automatically at scheduled intervals, essentially putting knowledge work "on autopilot."

Security and compliance controls: Writer's answer to enterprise IT concerns

Writer positions these enterprise-focused controls as a key differentiator from competitors. While Microsoft, OpenAI, and Anthropic offer powerful AI capabilities, Writer's executives argue those tools weren't designed from the ground up for the security, compliance, and governance requirements of large regulated organizations.

"All of the products you mentioned are great products, but even Copilot is very much focused on personal productivity—summarizing email, for example, which is important, but that's not the component we're focusing on," said Matan-Paul Shetrit, Writer's director of product management, in an exclusive interview with VentureBeat.

Shetrit emphasized Writer's "trust, security, and interoperability" approach. IT administrators can granularly control what the AI can access — for instance, preventing market research agents from mentioning competitors, or restricting which employees can use web search capabilities. All activity is logged with detailed audit trails showing exactly what data the agent touched and what actions it took.

"These fine-grained controls are what make products enterprise-ready," Shetrit said. "We can deploy to tens of thousands or hundreds of thousands of employees while maintaining the security and guardrails you need for that scale."

This architecture reflects Writer's origin story. Unlike OpenAI or Anthropic, which started as research labs and later added enterprise offerings, Writer has targeted Fortune 500 companies since its 2020 founding. "We're not a research lab that went to consumer and is dabbling in enterprise," Shetrit said. "We are first and foremost targeting the Global 2000 and Fortune 500, and our research is in service of these customers' needs."

Inside Writer's strategy to connect AI agents across enterprise software systems

A critical technical component is Writer's approach to system integrations. The platform includes pre-built connectors to more than a dozen enterprise applications—Google Workspace, Microsoft 365, Snowflake, Asana, Slack, Gong, HubSpot, Atlassian, Databricks, PitchBook, and FactSet—allowing the AI to retrieve information and take actions across those systems.

Writer built these connectors using the Model Context Protocol (MCP), an emerging standard for AI system integrations, but added what Shetrit described as an "enterprise-ready" layer on top.

"We took a first-principle approach of: You have this MCP connector infrastructure—how do you build it in a way that's enterprise-ready?" Shetrit explained. "What we have today in the industry is definitely not it."

The system can write and execute code on the fly to handle unexpected scenarios. If a user uploads an unfamiliar file format, for instance, the agent will generate code to extract and process the text without requiring a human to intervene.

Jwo demonstrated this capability with a daily workflow she runs: Every morning at 10 a.m., a Routine automatically summarizes her Google Calendar meetings, identifies external participants, finds their LinkedIn profiles, and sends the summary to her via Slack — all without her involvement.

"This was pretty simple, but you can imagine for a salesperson it might say, 'At the end of the day, wrap up a summary of all the calls I had, send me action items, post it to the account-specific Slack channel, and tag these folks so they can accomplish those workflows,'" Jwo said. "That can run continuously each day, each week, or on demand."

From mortgage lenders to CPG brands: Real-world AI agent use cases across industries

The platform is attracting customers across multiple industries. New American Funding, a mortgage lender, uses Writer Agent to automate marketing workflows. Senior Content Marketing Manager Karen Rodriguez uploads Asana project tickets with creative briefs, and the AI executes tasks like updating email campaigns or transforming articles into social media carousels, video scripts, and captions.

Other use cases span financial services teams creating investment dashboards with PitchBook and FactSet data, consumer packaged goods companies brainstorming new product lines based on social media trends, and marketing teams generating partnership presentations with branded assets.

Writer has added customers including TikTok, Comcast, Keurig Dr Pepper, CAA, and Aptitude Health, joining an existing base that includes Accenture, Qualcomm, Uber, Vanguard, and Marriott. The company now serves more than 300 enterprises and has secured over $50 million in signed contracts, with projections to double that to $100 million this year.

The startup's net retention rate — a measure of how much existing customers expand their usage — stands at 160%, meaning customers on average increase their spending by 60% after initial contracts. Twenty customers who started with $200,000-$300,000 contracts now spend about $1 million annually, according to company data.

'Vibe working': Writer's vision for AI-powered productivity beyond coding

Writer executives frame the platform as enabling what they call "vibe working" — a playful reference to the popular term "vibe coding," which describes AI tools like Cursor that dramatically accelerate software development.

"We used to call it transformation when we took 12 steps and made them nine. That's optimizing the world as it is," Habib said at Writer's AI Leaders Forum earlier this month, according to Forbes. "We can now create a new world. That is the greenfield mindset."

Shetrit echoed this framing: "Vibe coding is the theme of 2025. Our view is that ‘vibe working’ is the theme of 2026. How do you bring the same productivity gains you've seen with coding agents into the workspace in a way that non-technical users can maximize them?"

The platform is powered by Palmyra X5, Writer's proprietary large language model featuring a one-million-token context window — among the largest commercially available. Writer trained the model for approximately $700,000, a fraction of the estimated $100 million OpenAI spent on GPT-4, by using synthetic data and techniques that halt training when returns diminish.

The model can process one million tokens in about 22 seconds and costs 60 cents per million input tokens and $6 per million output tokens — significantly cheaper than comparable offerings, according to company specifications.

Making AI Decisions Visible: Writer's Approach to Trust and Transparency

A distinctive aspect of Writer's approach is transparency into the AI's decision-making process. The interface displays the agent's step-by-step reasoning, showing which data sources it accessed, what code it generated, and how it arrived at outputs.

"There's a very clear exhibition of how the agent is thinking, what it's doing, what it's touching," Shetrit said. "This is important for the end user to trust it, but also important for the IT person or security professional to see what's going on."

This "supervision" model goes beyond simple observability of API calls to encompass what Shetrit described as "a superset of observability" — giving organizations the ability to not just monitor but control AI behavior through policies and permissions.

Session logs capture all agent activity when enabled by administrators, and users can submit feedback on every output to help improve system performance. The platform also emphasizes providing sources and citations for generated content, allowing users to verify information.

"With any sort of chat assistant, agentic or not, trust but verify is really important," Jwo said. "That's part of the pillars of us building this and making it enterprise-grade."

What Writer Agent Costs—and Why It's Included in the Base Platform

Writer is including all the new capabilities—Playbooks, Routines, Connectors, and Personality customization—as part of its core platform without additional charges, according to Jwo.

"This is fully included as part of the Writer platform," she said. "We're not charging additional for using Writer Agent."

The "Personality" feature allows individual users, teams, or entire organizations to customize the AI's communication style, ensuring generated content matches brand voice and tone guidelines. This works alongside company-level controls that enforce terminology and style requirements.

For highly structured, repetitive tasks, Writer also offers a library of more than 100 pre-built agents and an AI Studio for building custom multi-agent systems aligned with specific business use cases.

The Race to Define Enterprise AI: Can Purpose-Built Platforms Beat Tech Giants?

The launch crystallizes a fundamental tension in how enterprises will adopt AI at scale. While consumer-facing AI tools emphasize individual productivity gains, companies need systems that work reliably across thousands of employees, integrate with existing software infrastructure, maintain regulatory compliance, and deliver measurable business impact.

Writer's wager is that these requirements demand purpose-built enterprise platforms rather than consumer tools adapted for business use. The company's $1.9 billion valuation — achieved in a November 2024 funding round that raised $200 million — suggests investors see merit in this thesis. Backers include Premji Invest, Radical Ventures, ICONIQ Growth, Salesforce Ventures, and Adobe Ventures.

Yet the competitive landscape remains formidable. Microsoft and Google command enormous distribution advantages through their existing enterprise software relationships. OpenAI and Anthropic possess research capabilities that have produced breakthrough models. Whether Writer can maintain its differentiation as these giants expand their enterprise offerings will test the startup's core premise: that serving Fortune 500 companies from day one creates advantages that research labs turned enterprise vendors cannot easily replicate.

"We're entering an era where if you can describe a better way to work, you can build it," Jwo said. "The new Writer Agent democratizes who gets to be a builder, empowering the operational experts and creative problem-solvers in every department to become the architects of their own transformation. That's how you unlock innovation that competitors can't replicate."

The promise is alluring — AI capabilities powerful enough to transform how work gets done, accessible enough for any employee to use, and controlled enough for enterprises to deploy safely at scale. Whether Writer can deliver on that promise at the speed and scale required will determine if its vision of "vibe working" becomes the 2026 theme Shetrit predicts, or just another ambitious attempt to solve enterprise AI's execution problem.

But one thing is certain: In a market where 85% of AI initiatives fail to escape pilot purgatory, Writer is betting that the winners won't be the companies with the most powerful models—they'll be the ones that make those models actually work inside the enterprise.

Microsoft remakes Windows for an era of autonomous AI agents

michael.nunez@venturebeat.com (Michael Nuñez) — Tue, 18 Nov 2025 16:00:00 GMT

Microsoft is fundamentally restructuring its Windows operating system to become what executives call the first "agentic OS," embedding the infrastructure needed for autonomous AI agents to operate securely at enterprise scale — a watershed moment in the evolution of personal computing that positions the 40-year-old platform as the foundation for a new era of human-machine collaboration.

The company announced Tuesday at its Ignite conference that it is introducing native agent infrastructure directly into Windows 11, allowing AI agents — autonomous software programs that can perform complex, multi-step tasks on behalf of users — to discover tools, execute workflows, and interact with applications through standardized protocols while operating in secure, policy-controlled environments separate from user sessions.

The shift is Microsoft's most significant architectural evolution of Windows since the introduction of the modern security model, transforming the operating system from a platform where users manually orchestrate applications into one where they can "simply express your desired outcome, and agents handle the complexity," according to Pavan Davuluri, President of Windows & Devices at Microsoft.

"Windows 11 starts with this notion of secure by design, secure by default," Davuluri said in an exclusive interview with VentureBeat. "And a lot of the work that we're doing today, when we think about the engagement we have with our customers, the expectations they have with us is making sure we are building upon the fact that Windows is the most secure platform for them and is the most resilient platform as well."

The announcements arrive as enterprises are experimenting with AI agents but struggling with fragmented tooling, security concerns, and lack of centralized management — challenges that Microsoft believes only operating system-level integration can solve. The stakes are enormous: with Windows running on an estimated 1.4 billion devices globally, Microsoft's architectural choices will likely shape how organizations deploy autonomous AI systems for years to come.

New platform primitives create foundation for agent computing

At the core of Microsoft's vision are three new platform capabilities entering preview that fundamentally change how agents operate on Windows. Agent Connectors provide native support for the Model Context Protocol (MCP), an open standard introduced by Anthropic that allows AI agents to connect with external tools and data sources. Microsoft has built what it calls an "on-device registry" — a secure, manageable repository where developers can register their applications' capabilities as agent connectors, making them discoverable to any compatible agent on the system.

"These are platform capabilities that then become available to all of our customers," Davuluri explained, describing how the Windows file system, for example, becomes an agent connector that any MCP-compatible agent can access with user consent. "We're able to do this in a fashion that can scale for one but it also allows others to participate in the Windows registry for MCP."

The architecture introduces an MCP proxy layer that handles authentication, authorization, and auditing for all communication between agents and connectors. Microsoft is launching with two built-in agent connectors for File Explorer and System Settings, allowing agents to manage files or adjust system configurations like switching between light and dark mode — all with explicit user permission.

Agent Workspace, entering private preview, represents perhaps the most significant security innovation. It creates what Microsoft describes as "a contained, policy-controlled, and auditable environment where agents can interact with software" — essentially a parallel desktop session where agents operate with their own distinct identity, completely separate from the user's primary session.

"We want to be able to have clarity in the identity of the agent that is operating in the local operating system," Davuluri said, addressing security concerns about agents accessing sensitive data. "We want that session to be a session that is secure, that is policy control, that is manageable, that has transparency and auditability."

Each agent workspace runs with minimal privileges by default, accessing only explicitly granted resources. The system maintains detailed audit logs distinguishing agent actions from user actions — critical for enterprises that need to prove compliance and track all changes to systems and data.

Windows 365 for Agents extends this infrastructure to the cloud, turning Microsoft's Cloud PC offering into execution environments for agents. Instead of running on local devices, agents can operate in secure, policy-controlled virtual machines in Azure, enabling what Microsoft calls "computer-using agents" to interact with legacy applications and perform automation tasks at scale without consuming local compute resources.

Taskbar becomes command center for monitoring AI agents at work

The infrastructure enables significant user interface changes designed to make agents as commonplace as applications. Microsoft is introducing "Ask Copilot on the taskbar," a unified entry point in preview that combines Microsoft 365 Copilot, agent invocation, and traditional search in a single interface.

Users will be able to invoke agents using "@" mentions directly from the taskbar, then monitor their progress through familiar UI patterns like hover cards, progress badges, and notifications — all while continuing other work. When an agent completes a task or needs input, it surfaces updates through the taskbar without disrupting the user's primary workflow.

"We've evolved and created new UX in the taskbar to reflect the unique needs of agents performing background tasks on your behalf," said Navjot Virk, Corporate Vice President of Windows Experiences, describing features like progress bars and status badges that indicate when agents are working, need approval, or have completed tasks.

The design philosophy, Virk emphasized, centers on user control. "These experiences are designed to be opt in. We want to give customers full control over when and how they engage with copilots and agents."

For commercial Microsoft 365 Copilot users, the integration goes deeper. Microsoft is embedding Copilot directly into File Explorer, allowing users to ask questions, generate summaries, or draft emails based on document contents without leaving the file management interface. On Copilot+ PCs — devices with neural processing units capable of 40 trillion operations per second — new capabilities include converting any on-screen table into an Excel spreadsheet through the Click to Do feature.

Microsoft bets on open standards against Apple and Google's proprietary approaches

Microsoft's embrace of the open Model Context Protocol, created by Anthropic, marks a strategic bet on openness as enterprises evaluate competing AI platforms from Apple and Google that use proprietary frameworks.

"Windows is an open platform, and by virtue [of being] an open platform, we certainly have the ability to take existing technologies, evolve, harden, adapt those, but we also allow customers to bring their own capabilities to the platform as well," Davuluri said when asked about competing with Apple Intelligence and Google's Android AI for Enterprise.

The company demonstrated this openness with Claude, Anthropic's AI assistant, accessing the Windows file system through agent connectors with user consent — one of numerous partnerships Microsoft has secured. Dynamics 365 is using the File Explorer connector to streamline expense reporting, reducing what was previously a 30-minute, dozen-step process to "one sentence with high accuracy," according to Microsoft's blog post. Other early partners include Manus AI, Dropbox Dash, Roboflow, and Infosys.

"Windows is the platform in which they build upon," Davuluri said of enterprise customers. "And so our ability to take those existing bodies of work they have, and extend them is the, I think, the least friction way for them to go, learn, adopt, experiment and find ways to [scale]."

Security model enforces strict containment and mandatory user consent

Microsoft's security model for agents adheres to what it calls "secure by default" policies aligned with the company's broader Secure Future Initiative. All agent connectors registered in the on-device registry must meet strict requirements around packaging and identity, with applications properly packaged and signed by trusted sources. Developers must explicitly declare the minimum capabilities their agent connectors require, and agents and connectors run in isolated environments with dedicated agent user accounts, separate from human user accounts. Windows requires explicit user approval when agents first access sensitive resources like files or system settings.

"We give Windows the ability to go deliver on the security expectations, and then it is auditable at the end of the day," Davuluri said. "You still want an auditability log that looks similar to perhaps what you use in the cloud. And so all three pieces are built into the design and architecture of Agent Workspace."

For IT administrators, Microsoft is introducing management policies through Intune and Group Policy that allow organizations to enable or disable agent features at device and account levels, set minimum security policy levels, and access event logs enumerating all agent connector invocations and errors. The company emphasized that agents operate with restricted privileges, with minimal permissions by default and access granted only to explicitly approved resources that users can revoke at any time.

Post-quantum cryptography and recovery tools address emerging and persistent threats

Beyond agent infrastructure, Microsoft announced significant security and resilience updates addressing both emerging and persistent enterprise challenges. Post-Quantum Cryptography APIs are now generally available in Windows, allowing organizations to begin migrating to encryption algorithms designed to withstand future quantum computing attacks that could break today's cryptographic standards. Microsoft worked closely with the National Institute of Standards and Technology to implement these algorithms.

"We are introducing post quantum cryptography APIs in Windows," Davuluri said. "For customers who want to be able to do cryptographic encryption in their workloads, they can start taking advantage of these APIs in Windows for the first time. That is a huge step forward for us when we think about the future of windows."

Hardware-accelerated BitLocker will arrive on new devices starting spring 2026, offloading disk encryption to dedicated silicon for faster performance while providing hardware-level key protection. Sysmon functionality is becoming generally available as part of Windows in early 2026, bringing advanced forensics and threat detection capabilities previously available only as a separate download directly into the operating system's event logging system.

The company also detailed progress on its Windows Resiliency Initiative, launched a year ago following the CrowdStrike incident that disrupted 8.5 million Windows devices globally. New recovery capabilities include Quick Machine Recovery with expanded networking support and Autopatch management, allowing IT to remotely fix devices stuck in Windows Recovery Environment. Point-in-time restore entering preview rolls back devices to earlier states to resolve update conflicts or configuration errors, while Cloud rebuild in preview allows IT to remotely rebuild malfunctioning devices by downloading fresh installation media and using Autopilot for zero-touch provisioning.

Microsoft is also raising security requirements for third-party drivers across the Windows ecosystem. Following updated requirements for antivirus drivers effective April 1, 2025, the company is expanding this approach to other driver classes including networking, cameras, USB, printers, and storage — requiring higher certification standards, adding compiler safeguards, and providing more Windows in-box drivers to reduce reliance on third-party kernel-mode code.

Measured rollout reflects enterprise caution around autonomous software

Microsoft is positioning these updates as essential infrastructure for what it calls "Frontier Firms" — organizations that "blend human ingenuity with intelligent systems to deliver real outcomes." However, the company emphasized a cautious, opt-in approach that reflects enterprise concerns about autonomous software agents.

"The principles we're using in designing these new platform capabilities accounts for the reality that we have a very, very broad user base," Davuluri said. "A lot of the features and capabilities we're building are opt in capabilities. And so it is our goal to be able to have users find value in the workflow and meet them."

Virk emphasized the measured approach: "This is more about meeting customers where they are and then taking them on this journey when they are ready. So there's the optionality, but also having support for it. And really important thing is that they should feel comfortable. They should feel secure."

Microsoft's bet is that only operating system-level integration can provide the security, governance, and user experience required for mainstream AI agent adoption. Whether that vision materializes will depend on developer adoption, enterprise comfort with autonomous software, and Microsoft's ability to balance innovation with the stability that 40 years of Windows customers expect. After four decades of putting users in control of their computers, Windows is now asking them to share that control with machines.

How AI tax startup Blue J torched its entire business model for ChatGPT—and became a $300 million company

michael.nunez@venturebeat.com (Michael Nuñez) — Tue, 18 Nov 2025 14:00:00 GMT

In the winter of 2022, as the tech world was becoming mesmerized by the sudden, explosive arrival of OpenAI’s ChatGPT, Benjamin Alarie faced a pivotal choice. His legal tech startup, Blue J, had a respectable business built on the AI of a bygone era, serving hundreds of law firms with predictive models. But it had hit a ceiling.

Alarie, a tenured tax law professor at the University of Toronto, saw the nascent, error-prone, yet powerful capabilities of large language models not as a curiosity, but as the future. He made a high-stakes decision: to pivot his entire company, which had been painstakingly built over nearly a decade, and rebuild it from the ground up on this unproven technology.

That bet has paid off handsomely. Blue J has since quietly secured a $122 million Series D funding round co-led by Oak HC/FT and Sapphire Ventures, placing the company's valuation at over $300 million. The move transformed Blue J from a niche player into one of Canada's fastest-growing legal tech firms, multiplying its revenue roughly twelve-fold and attracting 10 to 15 new customers every day.

The company now serves more than 3,500 organizations, including global accounting giant KPMG UK and several Fortune 500 companies. It is tackling a critical bottleneck in the professional services industry: a severe and worsening talent shortage. The U.S. has 340,000 fewer accountants than it did five years ago, and with 75% of current CPAs expected to retire in the next decade, firms are desperate for tools that can amplify the productivity of their remaining experts.

“What once took tax professionals 15 hours of manual research to do can now be completed in about 15 seconds with Blue J,” Alarie, the company's CEO, said in an exclusive interview with VentureBeat. "That value proposition—we can take hours of work and turn it into seconds of work—that is driving a lot of this."

When the dean's biography was wrong: the moment that changed everything

Alarie vividly remembers January 2023, when the dean of the law school stopped by his office for New Year's greetings. He asked her about ChatGPT and prompted the AI to describe her. ChatGPT confidently generated a biography. Some details were accurate. Others were completely fabricated.

"She was like, 'Okay, this is really kind of scary. This is wrong, and this has implications,'" Alarie said. Yet that moment of obvious failure didn't deter him. Instead, it crystallized his conviction.

The company's first iteration, launched in 2015, used supervised machine learning to build predictive models that could forecast judicial outcomes on specific tax issues. While technically sophisticated, it had a fundamental flaw: it couldn't answer every tax research question.

"The challenge was it couldn't answer every tax research question, which was really the holy grail," Alarie said. Customers loved the tool when it applied to their problem, but would quickly abandon it when it didn't. Revenue plateaued around $2 million annually.

Despite ChatGPT's notorious hallucinations, Alarie convinced his board to make the pivot. "I had this conviction that if we continued down that path, we weren't going to be able to address our number one limitation," he said. "Large language models seemed like a very promising direction."

He gave his team six months to deliver a working product.

From 90-second responses to 3 million queries: How Blue J tamed AI hallucinations

By August 2023, Blue J was ready to launch. What they released was, in Alarie's candid assessment, "super janky." The system took 90 seconds to respond. About half the answers had issues. The Net Promoter Score registered at just 20.

What transformed that flawed product into today's platform — with response times measured in seconds, a dissatisfaction rate of just one in 700 queries, and an NPS score in the mid-80s — was relentless focus on three strategic pillars.

First is proprietary content at massive scale. Blue J secured exclusive licensing with Tax Analysts (Tax Notes) and IBFD, the Amsterdam-based global tax authority covering 220+ jurisdictions. "We are the only platform on earth that takes in the best U.S. tax information from Tax Notes and the best global tax information from IBFD," Alarie said.

Second is deep human expertise. Blue J employs tax experts led by Susan Massey, who spent 13 years at the IRS Office of Chief Counsel as Branch Chief for Corporate Tax. Her team constantly tests the AI and refines its performance.

Third is an unprecedented feedback flywheel. With over 3 million tax research queries processed in 2025, Blue J is amassing unparalleled data. Each query generates feedback that flows back into the system.

Weekly active user rates hover between 75% and 85%, compared to 15% to 25% for traditional platforms. "A charitable ratio is like we're five times more intensively used," Alarie noted.

Inside Blue J's early access partnership with OpenAI

Blue J maintains an unusually close relationship with OpenAI that has proven crucial to its success. "We have a very good relationship with OpenAI, and we get early access to their models,"Alarie said. "It's quite collaborative. We give them a lot of really high quality feedback about how well different versions of forthcoming models are performing."

This feedback proves valuable because Blue J has developed what Alarie calls "ecologically valid" test questions — drawn from actual tax professional queries, with correct answers determined by Blue J's expert team. This helps OpenAI improve performance on complex reasoning tasks.

The company tests models from all major providers — OpenAI, Anthropic, Google's Gemini, and open-source alternatives — continuously evaluating which performs best. "We're not necessarily 100% committed to any particular provider," he explained. "We're testing all the time."

This approach helps Blue J navigate a challenging business model: charging approximately $1,500 per seat annually for unlimited queries while absorbing variable compute costs. "We've pre-committed to delivering them a really good user experience, unlimited tax research answers at a fixed price," Alarie said. "We're absorbing a lot of that risk."

Competition among foundation model providers creates downward pressure on API pricing, while Blue J's conservative usage modeling has proven accurate. Gross revenue retention exceeds 99%, while net revenue retention reaches 130% — considered best-in-class for SaaS businesses.

Taking on Thomson Reuters and LexisNexis with 75% weekly engagement

Blue J faces competition from established publishers like Thomson Reuters, LexisNexis, and Bloomberg, all of which announced AI capabilities throughout 2023 and 2024. Yet Blue J's engagement metrics suggest it has captured significant momentum, growing from just 200 customers in 2021 to over 3,500 organizations today.

The daily updates prove crucial. While the tax code itself changes only when Congress acts, the ecosystem evolves constantly through IRS regulations, new rulings, and court cases. All 50 states modify their tax codes regularly.

"Things are changing literally every day," Alarie said. "Every day we're updating the materials, and that's just the U.S. We cover Canada, we cover the UK. The aspirations are truly global for this thing."

Alarie's ambitions extend beyond building a successful startup. As author of the award-winning book "The Legal Singularity" and faculty affiliate at the Vector Institute for Artificial Intelligence, he has spent years contemplating AI's long-term impact on law.

In academic papers published in Tax Notes throughout 2023 and 2024, he chronicled generative AI's rise, predicting that "clients will become substantially more sophisticated" and that AI would push human experts toward higher-value strategic roles rather than routine research.

Blue J's $122 million plan: From tax research to 'global tax cognition'

The Series D funding, which brought total capital raised to over $133 million, will fuel aggressive geographic and product expansion. Blue J already operates in the U.S., Canada, and the U.K., with plans to eventually cover 220+ jurisdictions through its IBFD partnership.

Future capabilities could include automated memo generation, tax form completion, document drafting, and conversational history maintaining context across sessions—transforming Blue J from a research tool into what Alarie describes as "the operating layer for global tax cognition."

For all its success, Blue J operates in a domain where errors carry serious consequences. The hallucination problem hasn't been eliminated — it's been minimized through careful engineering, content curation, and human oversight. Blue J has trained its models to acknowledge when they cannot answer a question rather than fabricate information.

The business also faces economic risks if compute costs spiral or usage patterns exceed projections. And subtler questions loom about professional judgment: as AI systems become more capable, will users defer to outputs without sufficient critical evaluation?

From 15 hours to 15 seconds: What Blue J's AI pivot teaches every industry

Blue J's transformation offers lessons beyond tax software. The company's willingness to abandon eight years of proprietary technology and rebuild on an initially unreliable foundation required both courage and calculated risk-taking.

The decision paid off not because generative AI was inherently superior to supervised machine learning in all dimensions, but because it addressed the right problem: comprehensiveness rather than precision in narrow domains. Tax professionals didn't need 95% accuracy on 5% of questions. They needed good-enough accuracy on 100% of questions.

The improvement from an NPS of 20 to 84 in just over two years reflects relentless iteration informed by massive data collection. The content partnerships created differentiation that pure technology couldn't replicate. The team of tax experts provided domain knowledge necessary to ensure reliability.

Most fundamentally, Blue J recognized that the real competition wasn't other AI startups or even established publishers. It was the old way of doing things — the 15 hours of manual research, the institutional knowledge locked in retiring professionals' heads.

"People are like, 'What does Blue J do? They provide better tax answers. Okay, I think we need that,'" Alarie reflected.

As AI transforms profession after profession, that clarity of purpose may matter more than technological sophistication. The future belongs not to those who build the most advanced AI, but to those who most effectively harness it to solve problems humans actually have.

For a tax law professor who started with frustration about inefficient research methods, building a $300 million company marks an audacious endpoint. For the thousands of professionals now answering complex questions in 15 seconds instead of 15 hours, it represents the future of their profession, arriving faster than most expected.

The bet on ChatGPT when it was still hallucinating biographies has become a validation that sometimes the riskiest move is not to move at all.

Upwork study shows AI agents excel with human partners but fail independently

michael.nunez@venturebeat.com (Michael Nuñez) — Thu, 13 Nov 2025 18:30:00 GMT

Artificial intelligence agents powered by the world's most advanced language models routinely fail to complete even straightforward professional tasks on their own, according to groundbreaking research released Thursday by Upwork, the largest online work marketplace.

But the same study reveals a more promising path forward: When AI agents collaborate with human experts, project completion rates surge by up to 70%, suggesting the future of work may not pit humans against machines but rather pair them together in powerful new ways.

The findings, drawn from more than 300 real client projects posted to Upwork's platform, marking the first systematic evaluation of how human expertise amplifies AI agent performance in actual professional work — not synthetic tests or academic simulations. The research challenges both the hype around fully autonomous AI agents and fears that such technology will imminently replace knowledge workers.

"AI agents aren't that agentic, meaning they aren't that good," Andrew Rabinovich, Upwork's chief technology officer and head of AI and machine learning, said in an exclusive interview with VentureBeat. "However, when paired with expert human professionals, project completion rates improve dramatically, supporting our firm belief that the future of work will be defined by humans and AI collaborating to get more work done, with human intuition and domain expertise playing a critical role."

How AI agents performed on 300+ real freelance jobs—and why they struggled

Upwork's Human+Agent Productivity Index (HAPI) evaluated how three leading AI systems — Gemini 2.5 Pro, OpenAI's GPT-5, and Claude Sonnet 4 — performed on actual jobs posted by paying clients across categories including writing, data science, web development, engineering, sales, and translation.

Critically, Upwork deliberately selected simple, well-defined projects where AI agents stood a reasonable chance of success. These jobs, priced under $500, represent less than 6% of Upwork's total gross services volume — a tiny fraction of the platform's overall business and an acknowledgment of current AI limitations.

"The reality is that although we study AI, and I've been doing this for 25 years, and we see significant breakthroughs, the reality is that these agents aren't that agentic," Rabinovich told VentureBeat. "So if we go up the value chain, the problems become so much more difficult, then we don't think they can solve them at all, even to scratch the surface. So we specifically chose simpler tasks that would give an agent some kind of traction."

Even on these deliberately simplified tasks, AI agents working independently struggled. But when expert freelancers provided feedback — spending an average of just 20 minutes per review cycle — the agents' performance improved substantially with each iteration.

20 minutes of human feedback boosted AI completion rates up to 70%

The research reveals stark differences in how AI agents perform with and without human guidance across different types of work. For data science and analytics projects, Claude Sonnet 4 achieved a 64% completion rate working alone but jumped to 93% after receiving feedback from a human expert. In sales and marketing work, Gemini 2.5 Pro's completion rate rose from 17% independently to 31% with human input. OpenAI's GPT-5 showed similarly dramatic improvements in engineering and architecture tasks, climbing from 30% to 50% completion.

The pattern held across virtually all categories, with agents responding particularly well to human feedback on qualitative, creative work requiring editorial judgment — areas like writing, translation, and marketing — where completion rates increased by up to 17 percentage points per feedback cycle.

The finding challenges a fundamental assumption in the AI industry: that agent benchmarks conducted in isolation accurately predict real-world performance.

"While we show that in the tasks that we have selected for agents to perform in isolation, they perform similarly to the previous results that we've seen published openly, what we've shown is that in collaboration with humans, the performance of these agents improves surprisingly well," Rabinovich said. "It's not just a one-turn back and forth, but the more feedback the human provides, the better the agent gets at performing."

Why ChatGPT can ace the SAT but can't count the R's in 'strawberry'

The research arrives as the AI industry grapples with a measurement crisis. Traditional benchmarks — standardized tests that AI models can master, sometimes scoring perfectly on SAT exams or mathematics olympiads — have proven poor predictors of real-world capability.

"With advances of large language models, what we're now seeing is that these static, academic datasets are completely saturated," Rabinovich said. "So you could get a perfect score in the SAT test or LSAT or any of the math olympiads, and then you would ask ChatGPT how many R's there are in the word strawberry, and it would get it wrong."

This phenomenon — where AI systems ace formal tests but stumble on trivial real-world questions — has led to growing skepticism about AI capabilities, even as companies race to deploy autonomous agents. Several recent benchmarks from other firms have tested AI agents on Upwork jobs, but those evaluations measured only isolated performance, not the collaborative potential that Upwork's research reveals.

"We wanted to evaluate the quality of these agents on actual real work with economic value associated with it, and not only see how well these agents do, but also see how these agents do in collaboration with humans, because we sort of knew already that in isolation, they're not that advanced," Rabinovich explained.

For Upwork, which connects roughly 800,000 active clients posting more than 3 million jobs annually to a global pool of freelancers, the research serves a strategic business purpose: establishing quality standards for AI agents before allowing them to compete or collaborate with human workers on its platform.

The economics of human-AI teamwork: Why paying for expert feedback still saves money

Despite requiring multiple rounds of human feedback — each lasting about 20 minutes — the time investment remains "orders of magnitude different between a human doing the work alone, versus a human doing the work with an AI agent," Rabinovich said. Where a project might take a freelancer days to complete independently, the agent-plus-human approach can deliver results in hours through iterative cycles of automated work and expert refinement.

The economic implications extend beyond simple time savings. Upwork recently reported that gross services volume from AI-related work grew 53% year-over-year in the third quarter of 2025, one of the strongest growth drivers for the company. But executives have been careful to frame AI not as a replacement for freelancers but as an enhancement to their capabilities.

"AI was a huge overhang for our valuation," Erica Gessert, Upwork's CFO, told CFO Brew in October. "There was this belief that all work was going to go away. AI was going to take it, and especially work that's done by people like freelancers, because they are impermanent. Actually, the opposite is true."

The company's strategy centers on enabling freelancers to handle more complex, higher-value work by offloading routine tasks to AI. "Freelancers actually prefer to have tools that automate the manual labor and repetitive part of their work, and really focus on the creative and conceptual part of the process," Rabinovich said.

Rather than replacing jobs, he argues, AI will transform them: "Simpler tasks will be automated by agents, but the jobs will become much more complex in the number of tasks, so the amount of work and therefore earnings for freelancers will actually only go up."

AI coding agents excel, but creative writing and translation still need humans

The research reveals a clear pattern in agent capabilities. AI systems perform best on "deterministic and verifiable" tasks with objectively correct answers, like solving math problems or writing basic code. "Most coding tasks are very similar to each other," Rabinovich noted. "That's why coding agents are becoming so good."

In Upwork's tests, web development, mobile app development, and data science projects — especially those involving structured, computational work — saw the highest standalone agent completion rates. Claude Sonnet 4 completed 68% of web development jobs and 64% of data science projects without human help, while Gemini 2.5 Pro achieved 74% on certain technical tasks.

But qualitative work proved far more challenging. When asked to create website layouts, write marketing copy, or translate content with appropriate cultural nuance, agents floundered without expert guidance. "When you ask it to write you a poem, the quality of the poem is extremely subjective," Rabinovich said. "Since the rubrics for evaluation were provided by humans, there's some level of variability in representation."

Writing, translation, and sales and marketing projects showed the most dramatic improvements from human feedback. For writing work, completion rates increased by up to 17 percentage points after expert review. Engineering and architecture projects requiring creative problem-solving — like civil engineering or architectural design — improved by as much as 23 percentage points with human oversight.

This pattern suggests AI agents excel at pattern matching and replication but struggle with creativity, judgment, and context — precisely the skills that define higher-value professional work.

Inside the research: How Upwork tested AI agents with peer-reviewed scientific methods

Upwork partnered with elite freelancers on its platform to evaluate every deliverable produced by AI agents, both independently and after each cycle of human feedback. These evaluators created detailed rubrics defining whether projects met core requirements specified in job descriptions, then scored outputs across multiple iterations.

Importantly, evaluators focused only on objective completion criteria, excluding subjective factors like stylistic preferences or quality judgments that might emerge in actual client relationships. "Rubric-based completion rates should not be viewed as a measure of whether an agent would be paid in a real marketplace setting," the research notes, "but as an indicator of its ability to fulfill explicitly defined requests."

This distinction matters: An AI agent might technically complete all specified requirements yet still produce work a client rejects as inadequate. Conversely, subjective client satisfaction — the true measure of marketplace success — remains beyond current measurement capabilities.

The research underwent double-blind peer review and was accepted to NeurIPS, the premier academic conference for AI research, where Upwork will present full results in early December. The company plans to publish a complete methodology and make the benchmark available to the research community, updating the task pool regularly to prevent overfitting as agents improve.

"The idea is for this benchmark to be a living and breathing platform where agents can come in and evaluate themselves on all categories of work, and the tasks that will be offered on the platform will always update, so that these agents don't overfit and basically memorize the tasks at hand," Rabinovich said.

Upwork's AI strategy: Building Uma, a 'meta-agent' that manages human and AI workers

The research directly informs Upwork's product roadmap as the company positions itself for what executives call "the age of AI and beyond." Rather than building its own AI agents to complete specific tasks, Upwork is developing Uma, a "meta orchestration agent" that coordinates between human workers, AI systems, and clients.

"Today, Upwork is a marketplace where clients look for freelancers to get work done, and then talent comes to Upwork to find work," Rabinovich explained. "This is getting expanded into a domain where clients come to Upwork, communicate with Uma, this meta-orchestration agent, and then Uma identifies the necessary talent to get the job done, gets the tasks outcomes completed, and then delivers that to the client."

In this vision, clients would interact primarily with Uma rather than directly hiring freelancers. The AI system would analyze project requirements, determine which tasks require human expertise versus AI execution, coordinate the workflow, and ensure quality — acting as an intelligent project manager rather than a replacement worker.

"We don't want to build agents that actually complete the tasks, but we are building this meta orchestration agent that figures out what human and agent talent is necessary in order to complete the tasks," Rabinovich said. "Uma evaluates the work to be delivered to the client, orchestrates the interaction between humans and agents, and is able to learn from all the interactions that happen on the platform how to break jobs into tasks so that they get completed in a timely and effective manner."

The company recently announced plans to open its first international office in Lisbon, Portugal, by the fourth quarter of 2026, with a focus on AI infrastructure development and technical hiring. The expansion follows Upwork's record-breaking third quarter, driven partly by AI-powered product innovation and strong demand for workers with AI skills.

OpenAI, Anthropic, and Google race to build autonomous agents—but reality lags hype

Upwork's findings arrive amid escalating competition in the AI agent space. OpenAI, Anthropic, Google, and numerous startups are racing to develop autonomous agents capable of complex multi-step tasks, from booking travel to analyzing financial data to writing software.

But recent high-profile stumbles have tempered initial enthusiasm. AI agents frequently misunderstand instructions, make logical errors, or produce confidently wrong results — a phenomenon researchers call "hallucination." The gap between controlled demonstration videos and reliable real-world performance remains vast.

"There have been some evaluations that came from OpenAI and other platforms where real Upwork tasks were considered for completion by agents, and across the board, the reported results were not very optimistic, in the sense that they showed that agents—even the best ones, meaning powered by most advanced LLMs — can't really compete with humans that well, because the completion rates are pretty low," Rabinovich said.

Rather than waiting for AI to fully mature — a timeline that remains uncertain—Upwork is betting on a hybrid approach that leverages AI's strengths (speed, scalability, pattern recognition) while retaining human strengths (judgment, creativity, contextual understanding).

This philosophy extends to learning and improvement. Current AI models train primarily on static datasets scraped from the internet, supplemented by human preference feedback. But most professional work is qualitative, making it difficult for AI systems to know whether their outputs are actually good without expert evaluation.

"Unless you have this collaboration between the human and the machine, where the human is kind of the teacher and the machine is the student trying to discover new solutions, none of this will be possible," Rabinovich said. "Upwork is very uniquely positioned to create such an environment because if you try to do this with, say, self-driving cars, and you tell Waymo cars to explore new ways of getting to the airport, like avoiding traffic signs, then a bunch of bad things will happen. In doing work on Upwork, if it creates a wrong website, it doesn't cost very much, and there's no negative side effects. But the opportunity to learn is absolutely tremendous."

Will AI take your job? The evidence suggests a more complicated answer

While much public discourse around AI focuses on job displacement, Rabinovich argues the historical pattern suggests otherwise — though the transition may prove disruptive.

"The narrative in the public is that AI is eliminating jobs, whether it's writing, translation, coding or other digital work, but no one really talks about the exponential amount of new types of work that it will create," he said. "When we invented electricity and steam engines and things like that, they certainly replaced certain jobs, but the amount of new jobs that were introduced is exponentially more, and we think the same is going to happen here."

The research identifies emerging job categories focused on AI oversight: designing effective human-machine workflows, providing high-quality feedback to improve agent performance, and verifying that AI-generated work meets quality standards. These skills—prompt engineering, agent supervision, output verification—barely existed two years ago but now command premium rates on platforms like Upwork.

"New types of skills from humans are becoming necessary in the form of how to design the interaction between humans and machines, how to guide agents to make them better, and ultimately, how to verify that whatever agentic proposals are being made are actually correct, because that's what's necessary in order to advance the state of AI," Rabinovich said.

The question remains whether this transition— from doing tasks to overseeing them — will create opportunities as quickly as it disrupts existing roles. For freelancers on Upwork, the answer may already be emerging in their bank accounts: The platform saw AI-related work grow 53% year-over-year, even as fears of AI-driven unemployment dominated headlines.