- Generative AI Enterprise
- Posts
- 16 must-read AI agent playbooks
16 must-read AI agent playbooks
Plus, key takeaways to help you level up fast.
WELCOME, EXECUTIVES AND PROFESSIONALS.
The true benchmark for AI agents is their impact on the bottom line.
Yet, too often, the conversation gets stuck on definitions and theoretical benchmarks instead of practical results.
The real questions: Are they driving cost savings? Are they increasing revenue? Are they enabling your teams to do more meaningful work?
I've reviewed dozens of AI agent playbooks.
Here are 16 must-reads to help realize value:
MCKINSEY

Image source: McKinsey & Company
Brief: McKinsey released a CEO playbook to address the “gen AI paradox,” outlining how AI agents can unlock scalable value and the CEO’s strategic mandate to lead transformation in the agentic era.
Breakdown:
Nearly eight in ten companies report using gen AI, yet just as many report no significant bottom-line impact. Think of it as the “gen AI paradox.”
Enterprise-wide copilots scaled fast but offer limited value; 90% of transformative, vertical use cases remain stuck in pilot mode.
AI agents can break the paradox by shifting gen AI from a reactive tool to a proactive, goal-driven collaborator, automating complex processes.
Beyond efficiency gains, agents unlock agility and revenue streams. Realizing this value demands rethinking workflows from the ground up.
A new AI architecture paradigm, the ‘Agentic AI Mesh’, is needed to enable scale but the real challenge is human: earning trust and driving adoption.
Why it’s important: Agentic AI is not an incremental step, it is the foundation of the next-generation operating model. CEOs who act now won’t just gain a performance edge. They will redefine how their organizations think, decide, and execute. The time for exploration is ending. The time for transformation is now.
Full playbook here.
GOOGLE CLOUD

Image source: Google Cloud
Brief: Google published ‘The AI agent handbook’ 46-slides outlining 10 practical ways to use AI agents in business, including challenges, solutions, how to get started in Google Agentspace, and real enterprise examples.
Breakdown:
Agentspace offers prebuilt agents that can search enterprise data, generate ideas, conduct deep research, shorten sales cycles, and more.
Teams can build custom agents using Agent Designer or Vertex AI Agent Builder, then securely deploy them to Agentspace for enterprise-wide use.
For instance, your colleagues are discussing a complex topic. It's clearly important, but there’s no time to review all the background material.
You prompt an AI agent to summarize the relevant reports and data. Within moments, you have a clear summary in time for the meeting.
The handbook includes enterprise examples from organizations like Deloitte, Nokia, Verizon, Decathlon, and Gordon Food Service.
Why it’s important: By 2028, 33% of enterprise applications will include agentic AI, up from under 1% in 2024, enabling 15% of daily work decisions to be made autonomously. Google's handbook shows how Agentspace, launched in December 2024, helps companies operationalize AI agents in impactful ways.
BEST PRACTICE INSIGHT

Image source: McKinsey & Company
Brief: As agentic AI starts to reshape how decisions are made at scale, McKinsey encourages organizations to rethink governance, trust, and operating models, or risk falling behind in the next frontier of AI transformation.
Breakdown:
Organizations should treat AI agents like corporate citizens: governed, accountable, and expected to deliver value, just like human employees.
Enterprises need to rearchitect how decisions are made and work is done, enabling humans and AI agents to collaborate in complementary ways.
McKinsey provides a decision-making framework that classifies decisions based on risk and the level of judgment required.
Low-risk, low-complexity tasks, like verifying account details or checking claim status, are ideal candidates for full automation.
High-risk, high-judgment decisions, like fraud detection or exception handling, still benefit from human oversight, assisted by AI copilots.
Why it’s important: To unlock the full value of AI, enterprises will need to shift from task automation to decision design, focusing not on what can be automated but on which decisions should be. This requires AI agents with defined roles, accountability, and performance metrics, embedded into the operating model.
Full playbook here.
IBM

Image source: IBM
Brief: IBM published a 36-page report exploring agentic AI opportunities, risks, and responsible implementation, and why financial institutions must reimagine governance from the ground up. Most insights are relevant across industries.
Breakdown:
Agentic AI is creating a novel risk landscape beyond traditional AI, requiring a shift in risk management due to its self-directed nature.
For instance, as principal agents, depicted above, delegate to service and task agents, human intent can be distorted or lost.
When agents handle KYC, loan approvals, or fraud detection, real-time monitoring is critical. The report outlines over 50 controls.
IBM stresses “compliance by design,” embedding risk controls directly into AI systems as integral components of system architecture.
It also covers the evolution of AI agents, outlines practical steps to get started, and includes a 27-step RACI matrix to manage agentic AI.
Why it’s important: Agentic AI is early in its development but already planning, executing, and escalating decisions across onboarding, fraud, loans, and compliance. The choices enterprises make today will determine whether they lead at the frontier or fall behind, reacting to competitors.
Full playbook here.
PALO ALTO NETWORKS

Image source: Palo Alto Networks
Brief: Palo Alto Networks simulated attacks on AI agents built with CrewAI and AutoGen frameworks to explore vulnerabilities like data leaks, credential theft, and tool misuse. The cybersecurity firm then outlined defense strategies.
Breakdown:
Enforce safeguards in agent instructions to block out-of-scope prompts. Deploy content filters to detect prompt injection attempts at runtime.
Sanitize tool inputs, apply strict access controls and perform routine security testing, such as Dynamic Application Security Testing (DAST).
Enforce strong sandboxing with network restrictions, syscall filtering and least-privilege container configurations.
Use a data loss prevention (DLP) solution, audit logs and secret management services to protect sensitive information.
Combine multiple safeguards across agents, tools, prompts and runtime environments to build resilient defenses.
Why it’s important: As AI agents see broader real-world adoption, understanding their security implications is critical. Most vulnerabilities are framework-agnostic, rooted in insecure design patterns, misconfigurations, and unsafe tool integrations, not in the frameworks themselves.
Full playbook here.
BOSTON CONSULTING GROUP

Image source: Boston Consulting Group
Brief: Boston Consulting Group (BCG) published a 37-slide report on AI agents, covering how they are evolving, where they have product-market fit, how reliable and effective they can be, MCP’s role in agentic workflows, and building at scale.
Breakdown:
BCG explores how agents are moving beyond simple 'if-statements' toward more autonomous agents and multi-agent systems.
It outlines how coding agents are the first to reach product-market fit, with organizations realizing significant value from agentic workflows.
Bloomberg’s compliance agents rigorously check facts and identify edge-case risks, reducing time-to-decision by 30–50%.
BCG details six key dimensions for tracking agent performance, including reasoning and planning, task autonomy and execution, and more.
It explains how MCP help unlock agentic workflows through one unified protocol and highlights the emerging role of agent-to-agent protocols.
Why it’s important: In less than half a year since its launch by Anthropic, the Model Context Protocol (MCP) has been rapidly adopted by OpenAI, Microsoft, and others. BCG's effort to unpack MCP’s role and its significance as a meaningful step towards broad applications of agentic systems in production is a valuable read.
Full playbook here.
LANGCHAIN

Image source: LangChain
Brief: Harrison Chase, CEO of LangChain, the popular framework for building agents, responded to OpenAI’s new agent guide and Anthropic’s earlier release, sharing his perspective on agentic systems and how frameworks support them.
Breakdown:
Chase critiques OpenAI’s vague agent definition, favoring Anthropic’s precise framing of agentic systems as workflows, agents, or both.
OpenAI and Anthropic both note that agents aren't always needed; workflows are often faster, cheaper, more reliable, and simpler to implement.
Chase discusses the spectrum of “agentic” behavior, where systems vary in how agent-like they are, depending on their use of workflows and agents.
Agentic systems, whether workflows or agents, share many common features that can be provided by a framework or built from scratch.
Chase also shared a comparison of 14 agentic frameworks, evaluating capabilities in AutoGen, OpenAI’s Agents SDK, CrewAI, and others.
Why it’s important: Amid the market hype, posturing and noise, even leaders like OpenAI, Anthropic, and LangChain offer nuanced views, in part reflecting the pace of change in the space. With little precise analysis on agents and frameworks, this contribution offers timely and valuable insight.
Full playbook here.
OPENAI

Image source: OpenAI
Brief: OpenAI published a 34-page guide to building AI agents, drawing on insights from its customer deployments. It covers opportunity identification, agent design, and best practices for ensuring safe and effective performance.
Breakdown:
Advances in reasoning, multimodality, and tool use have led to LLM-powered agents, systems that independently perform tasks.
OpenAI recommends use cases once resistant to automation due to complex decisions, brittle rules, or heavy reliance on unstructured data.
At its core, an agent has three components: model, tools, and instructions. It requires three types of tools: data, action, and orchestration.
Orchestration: single-agent and multi-agent systems with Manager (agents as tools) and Decentralized (agents handing off to agents) patterns.
Set up guardrails to address identified use case risks, as shown in the diagram above, and add more as new vulnerabilities are discovered.
Why it’s important: Agents mark a new era in automation, where systems can reason through ambiguity, take action across tools, and handle multi step tasks with a high degree of autonomy. This guide offers the foundational knowledge to start delivering enterprise value with agents.
Full playbook here.

Image source: Google
Brief: Google’s 76-page “Agents Companion” paper builds on its original “Agents” paper, exploring the operationalization of gen AI agents. It covers AgentOps, evaluation, Agentic RAG, multi-agent systems, real-world case studies, and more.
Breakdown:
In operationalizing agents, Google emphasizes the importance of metrics within AgentOps to help build, monitor, and compare agent revisions.
For agent evaluation, it focuses on assessing agent capabilities, evaluating trajectory and tool use, and evaluating the final response.
AI is evolving towards multi-agent systems, specialized agents working together to achieve complex goals. The paper outlines design patterns.
Agentic RAG architecture is highlighted: autonomous retrieval agents that actively refine their search based on iterative reasoning.
Google showcases how specialized agents collaborate to power in-car conversational AI, illustrating real-world multi-agent systems in action.
Why it’s important: Gen AI agents mark a leap beyond standalone LLMs, enabling dynamic problem-solving and interaction. This “102” guide builds on core concepts, offering in-depth exploration of agent evaluation methods and practical applications to help enterprises operationalize results in production.
Full playbook here.
GALILEO

Image source: Galileo
Brief: Galileo, a company that specializes in AI evaluation, released a 93-page guide on mastering AI agents. It covers agent capabilities, real-world use cases, and frameworks, with a strong focus on performance evaluation.
Breakdown:
Chapter 1 introduces AI agents, their ideal uses, and scenarios where they can be excessive. It includes real-world cases from Salesforce and Oracle Health.
Chapter 2 details frameworks: LangGraph, Autogen, and CrewAI, providing selection criteria and case studies of companies using each.
Chapter 3 explores how to evaluate an AI agent through a step-by-step example using a finance research agent.
Chapter 4 covers measuring agent performance across systems, task completion, quality control, and tool interaction, with five detailed use cases.
Chapter 5 addresses why many AI agents fail and provides practical solutions for successful AI deployment.
Why it’s important: As AI agents become more prevalent, ensuring they work correctly and safely is key. This is where evaluation comes in. Galileo’s previous guide focused on "Mastering RAG," building enterprise-grade systems. Now, they’ve taken it further with agents using LLMs to complete broader, more complex tasks.
Full playbook here.
ANTHROPIC

Image source: Anthropic
Brief: Anthropic published an article sharing best practices from building effective agents with teams across industries, identifying seven common agentic system patterns in production and when to use them.
Breakdown:
Anthropic distinguishes between two types of agentic systems: workflows, where LLMs and tools follow predefined code paths, and agents, where LLMs dynamically direct their own processes and tool usage, controlling task execution.
Anthropic is seeing seven common agentic system patterns in production: Augmented LLM (building block), prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer (all workflows), and agents.
For instance, in the orchestrator-workers workflow, a central LLM divides tasks, assigns them to worker LLMs, and combines their results, making it ideal for tasks like complex coding and data searches across multiple sources.
Agents are emerging as LLMs advance in understanding complex inputs, reasoning, planning, tools, and error recovery. Ideal for open-ended problems where the steps are unpredictable and a fixed path can't be hardcoded.
For full details on all seven patterns, including architecture diagrams, usage guidance, and implementation examples, see the summary or full article.
The full articles also dives into customer support and coding agents, which have shown particular promise, as well as prompt engineering best practices and the use of frameworks like LangChain (see Anthropic’s cookbook).
Why it’s important: Anthropic's experience in building agents offers best practices to help enterprises leverage gen AI. These patterns can be adapted and combined for various use cases. While more agentic complexity can improve performance, it often increases latency and cost, so such tradeoffs should be considered.
Full playbook here.
MENLO VENTURES

Image source: Menlo Ventures
Brief: Menlo Ventures article AI Agents: A New Architecture for Enterprise Automation explores six gen AI examples from RAG to Autonomous Agents, detailing reference architectures and levels of autonomy. Architecture summary here.
Breakdown:
The fully autonomous agents of tomorrow might possess all four building blocks of AI agents: reasoning, external memory, execution, planning. But today’s LLM apps and agents do not.
The popular RAG architecture isn’t agentic but relies on reasoning and external memory. The key distinction is that these apps use the LLMs as a "tool" for search, synthesis, or generation, but their steps remain pre-determined by code.
By contrast, agents emerge when the LLM controls the application's flow, dynamically deciding actions to take, tools to use, and how to interpret and respond to inputs. Menlo ventures outlines three types of agents.
Decisioning Agent: At the most constrained end are “decisioning agent” designs, which use LLMs to traverse predefined decision trees.
Agent on Rails: “Agents on rails” offer more freedom with a higher-level objective, but constrain the solution space with an standard operating procedure (SOP) and a predetermined library of tools to choose from.
General AI Agents: At the far end of the spectrum are “general AI agents”, essentially for-loops with minimal data scaffolding, relying entirely on the LLMs reasoning for planning, reflection, and course correction.
Why it’s important: Gen AI is entering its agents era, marking a shift toward greater autonomy and sophistication in AI systems. To fully leverage their potential, it’s key to understand agentic designs and the nuances of varying definitions emerging across the market.
Full report here.
ADDITIONAL REPORTS
KPMG - Agentic AI advantage
BCG - Preparing for an AI-first future
IBM - Enterprise AI Agents
a16z - Insights for enterprise AI builders
Anthropic - Building trusted AI in the enterprise
McKinsey - Why agents are the next frontier of generative AI
Google - Agents
WEF - The rise of AI agents
OWASP - Agentic AI threats and mitigations
UiPath - Preparing for the agentic era
Capgemini - Agentic AI supply chains
MUST-READS WITH MORE DETAILED BREAKDOWNS
Agentic AI Case Studies (19 cases)
AI Strategy Playbooks (16 playbooks)
Enterprise AI Case Studies (20 cases)
Playbooks for AI Leaders (16 playbooks)
Enterprise AI Market (10 reports)
LEVEL UP WITH GENERATIVE AI ENTERPRISE
Generative AI is evolving rapidly in the enterprise, driving a new era of transformation through agentic applications.
Twice a week, we review hundreds of the latest generative and agentic AI best practices, case studies, and innovation insights to bring you the top 1%...
Explore sample editions:
All the best,

Lewis Walker
Found this valuable? Share with a colleague.
Received this email from someone else? Sign up here.
Let's connect on LinkedIn.