How to Build AI Agents to Be Effective

TARS from Interstellar is very effective! From SkJaack on ArtStation

Previous post | All posts in series | Next post

I have been working in AI at Microsoft to improve information worker productivity since 2016. Because my work is on our developer platforms, I’ve helped dozens of AI features ship… and watched them fail to have significant impact or success. Some of them are actually pretty annoying! I believe that AI agents are the first intelligent application that will have massive impact on the world’s productivity. Why? Because I’ve learned the secret of effective and empowering AI.

The answer is to enable your AI agents with the capabilities to handle menial or complicated work, saving the user time and brainpower. I assume your agent is already modeled after a specific business role or process, and you’ve identified tasks that benefit from conversation. Let’s turn our attention to how can you make it truly effective at those tasks.

I’ll describe how to be intentional with your agents by using:

Workflow details
Knowledge
Skills:
- APIs
- Search
- Code interpreter

Workflow details

Identifying, codifying, and improving how business gets done is a common enough idea in business that it gets its own term: Business Process Management (BPM). Through a combination of analysis tools like Viva Insights, workflow automation tools like Zapier, and Robotic Process Automation like UiPath, business have been trying to map, remove bottlenecks, and streamline for decades.

In order to automate a system, you must understand the system in extreme detail, including every edge case and path through the steps. This is painstaking work, deep in the realm of highly paid consultants. Of course this means that only the most critical and time-consuming processes get the treatment. What about everything else?

AI agents do not remove the need for BPM, but they do make it a lot easier and available to any person and any process. AI agents can understand your workflow from a simple description. The way that I do this is I first describe the usual steps of the process at a high level. For example, “You’re writing a RFP, which needs to be approved by the deal desk before being sent to the agency.” And then I describe the purpose of each step and the purpose of the process overall. Go ahead and include details and minutia if relevant; anything that you include is something that the user won’t have to remember.

The resulting AI will be more flexible and useful than a traditional automation. Changes in input, formatting errors, or even changes in execution won’t faze an AI that understands the purpose of the process and the steps. The ability for everyone to replace their menial processes with agents is going to supercharge the success of Business Process Management, but the most effective agents need more than this.

Knowledge

Modern LLMs can reason over huge amounts of input data. The ~100,000 words supported by GPT-4o could represent an entire novel! Agents in Microsoft 365 Copilot allow for up to 300 pages of data to be included as “knowledge”. It’s critical that you add as much content as will be useful, but don’t add anything that’s irrelevant.

The way the knowledge works technically is that these 300 pages are added to what the user types in, almost like they had typed it themselves. If you have a set of related documents that is dozens to low-hundreds of pages, the documents may make a great agent just like that. An employee handbook agent is a good place to start. There’s a reason that “chat with your document” is a common pattern. No human can remember all of that information, but an AI can!

On the other hand, don’t fill up the context window just because it’s there. Unrelated or low quality data will make the model perform worse. Include any content that will often be helpful to the AI. If it isn’t performing as well as you want it to, you may be including detrimental content.

One of my favorite tricks is to insert a database directly into the model context, with an XML, YML, or JSON file. Just for fun, I uploaded a 10,000 line JSON document of 300 records and asked Copilot about it:

This works great for retrieval. It will also understand XSLT and JSONPath just fine if you know those, but this particular query would have been a real pain to implement in JSONPath. Keep in mind that there are still limits. For example, Copilot undercounted how many people in the data have turquoise eyes (12, not 8).

But that’s where skills come in.

Skills

Skills (sometimes called functions, tools, or actions) work differently than instructions and knowledge grounding. A skill is a function the LLM can call to achieve something it cannot do on its own. Defining a skill in an AI agent means describing what it should be used for and what properties it needs. When the LLM decides to use it, it constructs a valid function call. This is the only way the agent can interact with the world outside of its chat response. For example, the LLM may output

search_people(name="Milly Pfiffer")

The system code, aka orchestrator, that is calling the LLM recognizes this valid function, runs it on behalf of the LLM, and puts the results as the next tokens. To the LLM, it looks like it used a skill itself and got the results back. The result could look like this:

search_people(name = "Milly Pfiffer")  
 {
    "id": 140,
    "first_name": "Milly",
    "last_name": "Pfiffer",
    "email": "[email protected]",
    "personal": {
...

It is difficult to use skills well, however. Getting high consistency and reliability in choosing the correct skill and specifying parameters can take a lot of work. If this is part of your scenario, I recommend starting to look at implementing an eval/test suite. One more tip: keep in mind that LLMs can produce entire SQL queries very easily, as long as they are instructed about the table schematics and purpose of the data. Just make sure your orchestrator that implements the actual function call doesn’t pass along any destructive operations!

Skills are critical for many AI agents, both to retrieve and to change information from other software. Some platforms build these skills in and make sure they work well for you.

Scoped search

Many AI chat products now have access to the entire internet, and AI at work has access to all documents produced by all employees throughout the business’s history. That is a huge amount of data! This is powerful, but can actually be detrimental when there’s a specific workflow you want to assist. When your AI agent is defined according to a role, you can scope the search domain to what is relevant to the role.

Your HR agents shouldn’t be able to access lunch menus and event plans. Instead, limit their search capability to the policy folders and FAQ sites. This prevents them from returning external information as internal fact. The best AI products won’t hallucinate when they are grounded in the right information. Getting only valid information is the way to accurately ground responses.

Code interpreter

The name “code interpreter” can be a scary one if you are not very technical. It shouldn’t be! Think of code interpreter as the mechanism to do any computer-like thing, back a few years ago before computers could talk with us. Non-AI systems are great at applying mathematical formulas to thousands of cells in a spreadsheet, determining stock price trends, and performing other complex calculations. A better name for this capability might be “calculations”! Who wouldn’t turn “calculations” on for their AI agent?

Code Interpreter is even more powerful than that, however. When it is enabled, the AI can do anything that code can do. For example, did you know that code interpreter can create Word and PowerPoint documents? Microsoft 365 Copilot or ChatGPT will look up information and use Python code to generate the actual DOCX or PPTX files, then give you a download link. This isn’t as good as Copilot in Word’s draft or PowerPoint’s narrative builder, but I use it when I need to also work with other types of data.

Code interpreter runs in a locked-down environment, so it isn’t accessing the internet. To use it well, make sure that the input data is available in the agent, as knowledge context or through search tools. You can use the instructions in a Copilot agent to tell it to first go get the data, and then use code interpreter to process and analyze it.

Conclusion

AI agents with workflow context and details, a lot of high-quality knowledge, and the right skills enabled are the most powerful software that exists. It no longer takes teams of people years to develop AI assistance that are often more annoying than helpful. This is the first time in history where the most highly advanced technology in the world is available to anyone that wants it. What are you waiting for?

I always love hearing about the agents that you’re building and how they help your work. Tell me about them on whichever social network you found this post! And stay tuned, I’ve got a few more articles in this series on AI agent authoring yet.