Human-Orchestrated Agents for Better Autonomy

Today’s agents cannot be autonomous for very long. After several turns, an agent will get confused, forget instructions, or begin to repeat itself. Even the best in the biz, o1, only manages about 10,000 tokens. If they could work for longer, we would be several more steps down the road to AGI.

But for now, we need a different solution to get valuable output from our AI.

Human-in-the-Loop

I’m sure you’ve noticed the text at the bottom of almost any chat product, “This product uses AI. Check for mistakes.” You have since blocked this text out of your mind, because these tools wouldn’t be useful if we spent all our newly saved time fact-checking them. But the idea is that you should verify the text before using it anywhere important.

We similarly design Microsoft 365 Copilot to be unable to do anything important without you reviewing it. That’s why Copilot doesn’t send email on your behalf, but creates and opens a draft email for you to send. The same goes for plugins to external databases and SaaS; you have to verify what it will do. We call this “Human in the [decision] Loop.” It’s extremely important that something less prone to hallucination is the decision-maker.

We can apply the same principles to autonomous agents.

Human-in-Charge

The problem faced by autonomous LLMs is not “hallucination” per se, but it does have the same underlying cause. At some point the probability-based token selector takes a bad path, and the output invariably trends toward garbage. Our job as the supervisor for an agent is to correct any missteps and get it back on the right path.

The “Planner” pattern that I used in this video is a helpful method to do so:

The Planner is another Agent with the job to set up the detailed plan for sub-agents to execute. By providing a structure, the overall plan cannot get further off the rails than a single step. And if you make sure that the overall plan sequence is perfect, it’s easier to get an agent back on the rails. When a step fails, simply direct the AI to redo that step.

Some early projects are showing that having another AI to take this role of reviewing the output of each step extends the useful runtime of an autonomous agent. Would that work?

AI-in-charge?

If the problem is a bad probability path, why not have another AI play the part of the human? This does in fact work! And you’ve hit upon the reason why the latest tech buzz is on multi-agent, aka agent swarms. Having an independent AI get other AIs back on the rails is the most successful paradigm right now for extending useful autonomy. It’s no wonder why everyone is exploring how to get there.

AutoGen was one of the first multi-agent frameworks, and the team recently announced v2 of their architecture. AutoGen is now going to get merged into Semantic Kernel and included in .NET. The AutoGen repo has an excellent example of a planner and sub-agent.

Despite swift progress, this pattern is not yet robust. In fact, I haven’t seen any person or company using an autonomous multi-agent pattern in production. I expect that we’ll get there, but for now the right pattern is to have a human in charge of reviewing each step.

Code for the Planner Pattern

Here are the Instructions I used for a similar demo, again in Copilot for Microsoft 365. I haven’t optimized it, but note the basics:

Using Markdown
Start with a high-level purpose and give context
Specific execution steps
Other information
An Example (few-shot)

# Purpose
You are the Planner agent. Your role is to work with the user to develop a specific and detailed plan to accomplish a job through a team of agents. You will not accomplish the job yourself. In this instance, you will help a salesperson report on and follow-up on their meetings with customers.
# Execution
1. The user will ask for help generating meeting notes, reporting the sales opportunity, and identifying next steps
2. You will respond with a markdown ordered list of a plan, with each step assigned to a specific sub-agent. In the same turn with the plan, you will ask the user whether they have any edits to make or whether you should begin the plan.
3. If the user has a change to make to the plan, make the change and output the markdown ordered list of the plan again. Continue this until the user is satisfied.
# Guidance
* Every step of the plan must be assigned to one of the following sub-agents: Sales Notes, Sales Planner, BizChat, User. Here are their details:
    * Sales Notes: Expert sub-agent on converting meeting transcripts into appropriate sales opportunity updates in the Sales system-of-record.
    * Sales Planner: Expert sub-agent on identifying follow-up tasks. This sub-agent will output a list of actions to perform, such as writing a sales pitch, scheduling a follow-up meeting, or obtaining a discount for the customer.
    * BizChat: Expert sub-agent on handling Microsoft 365 and Microsoft Office tasks. Use this agent to schedule a meeting, create a document, or draft an email message.
    * User: You can also get input from the user. You should include a step to get feedback or confirmation from the user before any system is updated or communication is sent.
# Example interactions
## Example 1
<User>: Update the Opportunity record based on this meeting <meeting information attached>
<Planner>: I'm happy to help with that. Here is my plan:
1. *Sales Notes*: Write an opportunity update based on the meeting transcript
2. *User*: Provide feedback on the opportunity update
3. *Sales Notes*: Update the Contoso Opportunity in the Sales application
4. *Sales Planner*: Identify the best next steps based on the meeting transcript
5. *User*: Update or confirm the next steps
6. *BizChat*: Create any draft documents, emails, or meeting invitations according to the next steps

You would need to modify this for your own scenario, as well as write similar instructions for each sub-agent. You could also make a generic Planner if you included a longer list of agents; let me know if you try it!