Why Others Think AI Is a Miracle But You Think It’s Useless

Will the next model release convince you?

As I’ve written before, AI can solve nearly any precisely described and self-contained task that exists. While there remain trick questions, today’s AI is otherwise PhD-level and beyond. I’m betting you’ve heard similar hype, and you think it’s overblown. “If AI is so great, how come it’s barely valuable to me?”

I could tell you that the next model is the one that we’ve been waiting for—that the recent exponential growth proves the future will be amazing. Or I could say that you just need to learn how to use it better. I won’t do that. There’s nothing wrong with you, but AI has some problems that we need to address. Today’s AI falls short of its hype for many due to three big reasons:

It often doesn’t have the data it needs to work with
Defining tasks precisely is very difficult
There’s little AI can do other than give you text or images

Missing Grounding Data

Hallucinating AI is not useful for many productive tasks, although it’s great for poetry. If you’ve been using ChatGPT regularly, you might be surprised to learn that hallucination is largely a solved problem for Microsoft Copilot. I don’t see hallucinations after even dozens of prompts (although it’s still easy to trick into hallucinating), because Copilot is trained to only respond with information it has been able to find through search. At Microsoft, we rigorously test all AI products and features for “groundedness,” which is to say that responses are “grounded” in fact.

But this only works to solve hallucination if a) the source data exists, 2) the source data is true, and 3) the AI product is focused on using it. All three can be a problem, but the third is the most common. Search-based products, including Perplexity and Google AI Overview, do a pretty good job of grounding responses and citing sources. On the other hand, chat-based products prioritize other things, like fun or programming ability, and they may not even implement a search function. They are definitely not gating feature release on sufficient groundedness.

When you’re working on something that shouldn’t have hallucinations, ask questions in a way that are likely to have source facts. Just as importantly, use an AI product that is careful to cite its sources and almost never hallucinate.

Poorly Defined Tasks

I define “AI slop” as text that is neither reviewed nor read, and it’s increasingly a problem on the internet. In my opinion, a major cause of slop is buttons with a 🪄 emoji on them. Magic buttons produce extremely average results. A magic button will never return an answer in the voice of Glorglax the Mighty. If you want something interesting, you have to ask for it. Meanwhile, I hope you can write better than aggressively average AI slop!

Mathematics is another story entirely, and there’s a reason the frontier labs focuses on it: it is unambiguous. A math problem means exactly one thing and nothing else. Average math answers, when correct, are perfect. The rest of the world is not so easily defined. While conversation (such as text chat or voice) is the most natural way to define a task, this requires time and effort to do.

There’s no shortcut to this. You must think hard, prompt, review the results, and iterate.

Inability to Act

Most AI products return text to you. You have to do the work of copying and pasting that text into where it needs to go. Some features are built in to those systems, but those are the same features that are most likely to feature 🪄 buttons.

The common solution is to introduce tool use into the AI product. These work by instructing the model to output the text web-search: search term, then having outside code perform that web search, add the search results into the LLM context, and have the LLM continue. Getting this right is quite difficult, even when only a few tools are included. But most of our work involves dozens of systems, each with hundreds of functions! Perhaps models will be better at doing things one day, but today they are not.

The combination of missing grounding data, needing to take time to define tasks precisely, and only getting as far as producing text means that AI is difficult to use. First you have to recognize whether a task will be valuable and not prone to hallucination. Then you need to work hard to define the task precisely, and finally, you must translate the result into the place where you actually need it. This isn’t natural and it is time-consuming.

Those results can be incredible, but I do not blame you if you think AI is worthless. It’s difficult, time-consuming, and is going to produce a lot of so-so results until you overcome these hurdles.

It is up to those of us working in AI product to make it more factual, easier to get what you want, and better integrated into the software you use everyday. I know that’s what I’m working on! I do believe in AI’s promise, but until we succeed in these solutions, it doesn’t matter how good the models are.