Ensuring the success of AI agent deployments

AI agents are transforming business operations, but their high failure rates highlight a critical need for better data, robust infrastructure, and ongoing human oversight to unlock their promised efficiency and value. Success hinges on practical deployment and rigorous safeguards.

Monday, 21st July 2025 Posted 3 months ago in by Aaron Sandhu

Ensuring the success of AI agent deployments

We’re in the midst of a massive wave of enthusiasm for AI agents and what they can bring for businesses. Many companies are building AI agents, but these agents ultimately don’t all deliver the business value that they’re meant to.

Built to connect with large language models (LLMs) and other resources using complex workflows, AI agents can plan, reason and execute actions to achieve a given goal. They break an instruction down into smaller tasks and choose which tools to use. They can work individually (single-agent architecture) or as a team (multi-agent architecture).

2024 saw leading enterprises such as OpenAI, Google DeepMind, Microsoft, and PwC begin to integrate AI agents into their operations. Research from CapGemini shows that 51% of organisations intend to invest further in AI agents to some extent in 2025, with that figure projected to reach 82% within the next three years.

But a significant gap remains between adoption and performance. Researchers at Carnegie Mellon University (CMU) and Salesforce measured the successful completion rate for AI agents, and found that it hovered around 30 to 35% for multi-step tasks.

It’s remarkable that despite these fail rates, so many companies are so bullish on AI agents, which points to their enormous potential to increase operational efficiencies. So why are the success rates currently so poor, and what can enterprises do to prevent AI agent deployments from joining lists of failed projects?
Strengthen your data foundation
AI agent performance is determined to a great extent by the quality and breadth of their access to data. Using data that’s outdated, narrow, and/or biased undermines the accuracy, objectivity, relevance, and speed of the output, sometimes with catastrophic results.

It’s one thing to receive unreliable data insights; it’s quite another for an AI agent to send emails that contain offensive language, misdiagnose patients, transfer funds to the wrong account, or purchase shares of a stock that shows no sign of increasing in value.

The open web is the best source of fresh, diverse, broad datasets, but it’s not easy to scalably import fresh data from the open internet. Using a data retrieval infrastructure solution like Bright Data’s MCP server can help ensure that your agents have access to the most useful and fresh information. The server scales automatically and performs consistently to meet your data needs as they grow. By successfully unblocking all public data and connecting smoothly to your LLM, Bright Data brings broad datasets directly to the models that underpin your AI agents.
Break tasks down
There are limitations to what AI agents can do, at least for now and possibly for a long while. Research from AI evaluation company METR and Oxford’s Toby Ord points to an “AI agent half-life,” or the point where an agent’s chances of successfully completing a task fall to 50%.

According to the research, the longer and more complex the task, the lower the success rate for AI agents. So if an AI system has a 50% chance of completing an hour-long task, it only has a 25% chance of completing a two-hour-long task, and 99% reliability only comes with a task that’s 1/70th of the agent's half-life.

In practical terms, this means that even though AI agents are designed to break down complicated tasks, overloading an agent is likely to result in failure. Set smaller, reasonable objectives that are aligned with agent capabilities. It’s just as important to check that the tasks you set don’t conflict with security or privacy regulations. Otherwise, the agent can’t access the necessary sensitive data to complete the task.
Build robust, scalable infrastructure
AI agents can demand a lot of compute power, but sometimes they are deployed in a hurry on existing infrastructure that isn’t designed for the load. While this might be effective initially, the system tends to fall apart at the seams as the number of agents and/or complexity of their interactions grows.

It’s crucial to lower these risks of failure by choosing scalable architecture with efficient resource management. Cloud-based solutions are the most effective, with stateless cloud deployment that removes local dependencies and serverless systems that reduce resource waste.

Vercel’s AI Cloud is a good option. It uses infrastructure-as-code (IaC) to enable fluid resource reuse, streamlined provisioning, and minimal compute for both burst and idle workloads.
Check your guardrails
It’s possible that AI agents won’t ever achieve 100% reliability. They send emails to the wrong addresses, make incorrect recommendations, and approve requests that shouldn’t be permitted.

AI agents may fail in both the reasoning and the planning stages, and they don’t always make good choices about which tools to use. While you can minimise these risks, they’ll still lurk in the shadows. That’s why it’s vital to be mindful about how much autonomy you give to AI agents.

Maintain surveillance, including audit trails, compliance checks, and ongoing monitoring, and always keep humans in the loop with approval workflows and the option to escalate tasks to human operatives.
Bake in recovery and fault tolerance
It’s not a question of if your AI agent will hit snags, freeze, or stall, but when it will happen, so you need to plan for those eventualities. AI agents have to be able to recover from errors and keep running, maintain performance, and avoid crashing even when the system encounters problems and confusion.

This requires baking in redundancy, meaning multiple agents running in parallel who can pick up smoothly if their “colleagues” drop the ball. Fault tolerance involves embedding intelligent recovery systems like self-healing mechanisms, and smart retry workflows that automatically attempt recovery and utilise exponential backoff strategies.

Stateful recovery, or restarting from the last known good state, uses persistent storage to save state and context and support faster recovery. DBOS offers an open-source durable execution library that reduces the need for stitched-together orchestration, which creates more potential breakpoints, and builds more reliable fault-tolerant backends.
Test, test, and test again
All too often, AI agents are delivered with the promise of perfect operations and then fail repeatedly. “But it worked in development” is a common refrain. This is partly because it’s challenging to evaluate agent performance, since you can’t easily validate outputs against expected results. Agents operate in dynamic environments with complex interactions, so it’s difficult to establish clear success metrics.

You can reduce the chances of failure by ramping up your test scenarios. Feedback loops and automated testing using CI/CD pipelines help reveal and fix faults to enable continuous improvement based on performance data.

Ideally, you’d use a sandbox like that offered by The Agent Company to assess agent performance in real-world situations before releasing them to production. This is an environment designed to imitate business operations, allowing you to try AI agents on typical tasks like internet search, code writing, running apps, and sending messages.
Your AI agent project can deliver results
Although many AI agents fail to actualise their potential, there’s no reason for your deployment to fall into that category. By paying attention to the most common points of failure and acting to reduce their occurrence, you can successfully implement an AI agent system that meets your expectations and increases business efficiency and productivity.