Lessons from Enterprise AI Projects

I've shipped AI systems for enterprises across EdTech, AgroTech, FinTech, security compliance, marketing intelligence, and a few other verticals I don't usually mention in polished write-ups because they were messy. Some of those projects were genuinely successful. A few were expensive. One in particular I'm going to describe in detail because it's more instructive than the wins, and I think people don't talk enough about the projects that just didn't work.

The patterns repeat. Here's what I've learned.

The three questions I ask before anything else

Before committing to any enterprise AI project, I need clear answers to three things. I don't mean vague answers. I mean someone in the room who can write the answer on a whiteboard and defend it.

What decision does this actually improve?

AI systems that work in production solve specific, well-defined problems. "Use AI to improve customer service" isn't a specification, it's a category. "Automatically categorize support tickets by urgency and route to the right team" is a specification. The difference matters because one of those can be measured, iterated on, and handed off to an on-call engineer at 2am. The other one can't.

When teams can't articulate the decision they're improving, the project drifts. Scope expands. Success criteria shift. Six months in, someone asks "what are we actually building?" and nobody has a clean answer. I've sat in that meeting more than once and it's not a good place to be.

What does success look like numerically?

I want a number before we write a line of code. Some examples from projects I've actually shipped: reduce average ticket resolution time from around four hours to under one hour; achieve 93% accuracy on essay evaluation against human graders; process a million documents per quarter with fewer than 5% requiring human review. These numbers weren't perfect — we adjusted some of them as we learned more — but having them meant we could make real trade-off decisions during development instead of arguing about vibes in retrospectives.

Without a number, you can't know if you've succeeded. You can't defend the project when a skeptical stakeholder asks why it's taking so long. You definitely can't prioritize which failure mode to fix first.

What happens when the AI is wrong?

Every AI system makes mistakes. The question isn’t whether — it will — but what the error pathway looks like.

Post this

This one I ask partly because it reveals whether the team has thought seriously about the problem at all. Every AI system makes mistakes. The question isn't whether — it will — but what the error pathway looks like. Who catches it? What's the cost? How do you recover? Projects that can't answer these questions are projects that will fail in production. The only question is how visibly and at what cost.

The project that just didn't work

I want to talk about a specific one. An agricultural intelligence project — the goal was to build a recommendation engine that would help small farm operations make input purchasing decisions: when to buy fertilizer, which inputs to prioritize given weather forecasts and commodity prices, that sort of thing. The domain was genuinely interesting, the problem was real, and the people involved were smart and committed.

It didn't work. Not in production, anyway.

The core issue was data. We'd done due diligence — or thought we had — and the enterprise had assured us they had years of operational data. What they actually had was years of records, which turned out to be a very different thing. The data was fragmented across spreadsheets, two legacy systems that didn't talk to each other, and handwritten log books that someone had partially digitized. The digitization had introduced its own inconsistencies. Dates were in four different formats. Plot identifiers had been renamed twice. Input volumes were recorded in different units depending on who had entered them.

We spent roughly three months just on data reconciliation — time that wasn't in the original estimate because we hadn't audited the actual data before committing to the project. When we finally had a clean training set, it covered about 40% of the operational scenarios we'd promised to handle. The model we built was genuinely good on that 40%. On the rest, it hallucinated recommendations with enough plausibility that users didn't immediately recognize them as wrong.

We presented the results honestly. The client wasn't happy, which was fair. We hadn't delivered what we'd promised, even though the reasons were partly outside our control. The project ended. I think about it every time someone tells me their data is ready.

The lesson I took isn't "don't do AgroTech AI." It's: audit the actual data, not the assurances about the data, before you commit to a project. Open the files. Query the database. Talk to the person who enters records, not just the person who owns the system. The person entering records always knows where the skeletons are.

Why enterprise AI projects fail (the patterns I keep seeing)

Starting with the solution

"We need to use AI" is not a business requirement. It's a technology preference. Projects that start with the solution rather than the problem build impressive demos that get applauded in presentations and then gradually stop being used because they don't quite fit how people actually work. I'm not against ambitious AI projects — I've pushed for them — but the ambition should be attached to a problem, not a capability.

The data fantasy

"We have all the data we need" is almost never true when you dig in. This isn't unique to my AgroTech experience. I've hit it in FinTech (data locked in a vendor's system with no export API), in EdTech (student data fragmented across cohort years in different schemas), in compliance work (critical documents scanned as non-searchable PDFs). Audit data before committing to a project. What specifically will you use? Where does it come from? Who controls access? Do this before you hire the ML team, not after.

Pilot purgatory

Many enterprise AI projects succeed as pilots and fail as products. The pilot works on curated data, with dedicated attention, at limited scale. Production is messier. The data is dirtier. The users are less forgiving. The edge cases that didn't appear in the curated dataset turn up constantly in the wild. I've seen good pilots die in production transitions more times than I can count (I'm probably being imprecise here — let's say more than five times and leave it at that).

The fix is designing pilots that deliberately test production conditions. Use real data, not sanitized samples. Include skeptical users, not just champions. Plan the production transition before the pilot starts, not when the pilot results come in.

Requirements that move

Stakeholders change their minds. This is human and understandable and it will derail your project if you don't build in checkpoints that explicitly renegotiate scope. I've started using milestone-based development with locked requirements between milestones. When something changes — and it will — we have a defined process: acknowledge the change, estimate the impact, renegotiate timeline and resources explicitly. This sounds bureaucratic until you've been on a project where requirements drifted silently for six months and you're explaining to a client why you're delivering something different from what they thought they were getting.

Integration as afterthought

The model works beautifully in isolation. Then it needs to connect with the CRM, the ERP, the data warehouse, the SSO system, and sometimes a legacy mainframe that nobody fully understands. Each integration surfaces assumptions that don't hold. Each one takes longer than estimated. I've had single integrations eat two months of a project timeline.

Integration is not a phase. It's the project. Map integration requirements before you build the features that depend on them. Prototype the integrations first. Budget time generously, and then add another 30%.

Patterns that actually work

Start narrow

The best enterprise AI projects I've been part of started with a single, tightly scoped use case. One project started with AI-assisted grading for a single rubric. We proved 93% accuracy. We built trust with the client. We expanded to 85 rubrics and somewhere around 250 tenants over the following year. The narrow start made the broad expansion possible — not just technically but organizationally, because we had evidence that the system worked before anyone asked us to stake more on it.

Human-in-the-loop by default

New AI systems should assist humans, not replace them. I know this sounds conservative and maybe it is. But it's also risk management: humans catch errors, handle edge cases, and generate the feedback that improves the system over time. Automation should come after trust is established, in domains where errors are recoverable, and always with monitoring that can detect drift. Moving fast here creates problems that slow you down much more than the cautious approach would have.

Evaluation infrastructure from day one

This might be the most important one. The projects that succeed have serious evaluation built in from the start: test sets that cover the full distribution of inputs (not just the happy path), automated pipelines that measure quality on every change, human review processes that generate ground truth, feedback mechanisms that capture what's actually happening in production. Without this, you're guessing about model quality. With it, you're engineering toward a measurable target.

I've seen teams treat evaluation as something you do at the end, before launch. That's backwards. Evaluation is how you know whether you're making progress. It should be the first thing you build.

Executive sponsorship with honest expectations

AI projects need executive support — for resources, for organizational resistance, for surviving the inevitable setbacks. But that support has to come with honest expectations. The best sponsors I've worked with understood enough about the technology to set appropriate expectations themselves and defend the team when progress was slower than hoped. The worst ones had been sold on AI as magic and were perpetually disappointed that reality was messier.

The economics, which people avoid talking about

Development costs are almost always underestimated, usually by something like two or three times. Running costs — inference, monitoring, maintenance, retraining — are often ignored entirely until they show up in a budget review and cause a crisis. The value delivered needs to be quantified, not assumed, before the project gets approved.

"Strategic importance" is not a business case. If the numbers don't work, the project shouldn't proceed. I've had to say this to clients. It's not a popular thing to say, but it's better than building something that can't justify its own existence in production.

What I actually tell people

Start with the problem, not the technology. Define success numerically before you build anything. Plan for failure — because the AI will be wrong, and you need to know what happens when it is. Budget more time than you think for data and integration, then add some more. Keep humans in the loop until you have months of evidence that the system is reliable. Invest in evaluation like it's the product, because in many ways it is.

The projects that follow this tend to succeed more than they fail. The ones that don't follow it tend to fail more than they succeed. The advice isn't complicated. But I've watched smart teams skip steps they thought were unnecessary and pay for it, so apparently it bears repeating.

← Back to Home