INVESTIGATION · AI x INTELLIGENCE
My New Analyst Is Fast, Fluent, and Occasionally Lies. I Don't Let It Decide.
6 min read · June 2026
Thesis
AI took over the parts of my job I will not miss. It is still bad at the part that was ever the actual job.
The Question
On the first real task I gave it, a market size, the model handed me a clean, plausible number with a citation attached. The citation did not exist. Not a wrong source, an invented one, formatted perfectly, the kind of thing that walks straight into a client deck if you trust the fluency. That was the moment the question got concrete for me. I use these tools every day, so instead of arguing about whether AI replaces research, I ran my own work through them and watched where they helped, where they broke, and where they handed me a confident lie. Which parts of competitive and market intelligence actually automate, and which only look automatable until you check the output?
The Hypothesis
My working belief, the open hypothesis I have been sitting on, is that most "AI replaces MI" claims confuse retrieval with judgement. Pulling and summarising information is not the job. Deciding what matters, what to trust, and what the non-obvious read is, that is the job. The hypothesis: AI eats the retrieval and drafting layer almost completely and barely touches the judgement layer, and the danger is that the retrieval layer looks so polished you stop checking whether the judgement underneath is real.
How I Looked At It
I took real tasks from my own week. A competitor scan. A first-pass market size. Summarising a long set of earnings commentary. Drafting an executive summary. I ran each through current general-purpose models, then checked the work the way I would check a junior analyst's. Same standard. No special pleading for the machine.

What I Found
The breaks first, because they are the part people skip. Beyond the invented citation, the competitor scan confidently attributed a product to the wrong company, fluently, with no flicker of doubt. And the earnings summary did the thing that worries me most: it smoothed over the one strange data point. The outlier was the story, and the model treated it as noise and tidied it away, because tidy is what it optimises for. Three tasks, three different ways to be confidently wrong.
The help is real and I will not pretend otherwise. The drafting and summarising are better than I expected and faster than I am. Long, dull source material compressed cleanly. First drafts that saved me the blank-page tax. I do not miss doing that part by hand.
What it could not do at all was the part that was ever the actual work. It did not know which of the twenty things in the scan was the one that mattered. It had no nose for the source that looked authoritative and was a recycled press release. It could not produce the contrarian read, the everyone-believes-X-and-the-org-charts-say-Y, because that comes from friction with the material, not retrieval of it. It is a fast, fluent, occasionally lying junior analyst that never gets bored and never gets suspicious. The never-suspicious part is the whole problem.
What It Means For Business
If your MI function mostly produces retrieval and summary, a lot of it is now automatable, and pretending otherwise is how you get caught flat. If you replace the judgement layer with the retrieval layer because the retrieval layer writes so confidently, you have not cut cost, you have bought a machine that produces plausible wrongness at scale and removed the person whose job was to catch it. So if you handed me your MI function tomorrow, here is the call I would make. I would automate the first-pass scanning, the summarising, the drafting, and the formatting on day one, and reinvest the time saved into fewer, more senior people whose entire job is framing the question and smelling the bad source. And the one thing I would never let the model do unsupervised is decide what matters, because that is the job, and it is the one thing it cannot yet be trusted with.
The Decision
What I’d Test With More Time
I genuinely do not know where the line sits in two years, and I distrust anyone who says they do. I would want to test whether models get better at flagging the outlier instead of smoothing it, because that single behaviour is the difference between a tool I supervise and a tool I would let near a decision. Right now I supervise it. I am not close to letting it decide.