What Investors Should Ask Before Funding An AI Company

The lazy investor question is: "What model do you use?" The useful investor question is: "What has to be true for this AI system to keep working when customers, attackers, regulators, and production volume arrive?"

That second question is less glamorous. It is also where the truth lives.

Here is the technical diligence scorecard I would use before funding an AI company. It is not meant to replace commercial diligence. Revenue, distribution, market timing, and founder quality still matter. But in AI companies, technical shape increasingly determines whether those business claims can survive.

Investor scorecard with workflow depth, evaluation, security, unit economics, and learning loop.
Source-informed investor scorecard. Use it as a conversation guide, not a mechanical pass-fail checklist.

1. What Job Does The AI Actually Own?

A serious AI company should be able to describe the workflow it changes in boring detail. Who starts the job? What input arrives? What decision is made? What happens when confidence is low? Who reviews exceptions? What does the customer stop doing because the product exists?

Weak answer: "We automate enterprise knowledge work." Stronger answer: "We reduce contract intake review from three human passes to one review queue, with required human approval for pricing exceptions and regulated clauses."

2. How Do You Know The Output Is Good?

Evaluation is the line between an AI feature and a product. I would ask for the test set, rubric, historical baseline, false-positive and false-negative examples, review process, and production monitoring plan. I would also ask who can change the rubric and how changes are versioned.

Public frameworks like NIST AI RMF and MLCommons AILuminate are useful reminders that measurement is not a one-time launch activity. It is a loop.

3. What Can The System Leak, Trigger, Or Overtrust?

Every AI system has a blast radius. It may reveal data, execute a tool, write to a downstream system, generate unsafe content, or persuade a user to trust something that was not checked. OWASP's LLM Top 10 is a practical way to ask whether the team has tested prompt injection, output handling, sensitive information disclosure, excessive agency, and overreliance.

If the answer is "we will add guardrails later," translate that as "we have not priced security into the product yet."

4. Do The Unit Economics Still Work At Volume?

AI gross margin is not decided by the model price alone. It is shaped by context length, retrieval, caching, retries, human review, latency targets, customer support, compliance overhead, and provider choice. Official pricing pages from model providers are useful, but the company should have its own cost traces from real workloads.

I would ask for cost per successful task, not cost per prompt. A prompt is not a business event. A resolved claim, approved review, qualified lead, answered ticket, or completed analysis is.

5. What Makes The Product Improve With Use?

"We have proprietary prompts" is not a moat. It might be a useful implementation detail, but it rarely explains defensibility. Better answers involve workflow data, distribution, customer-specific integrations, feedback loops, review datasets, domain-specific controls, and switching costs.

Weak AI diligence signals compared with stronger evidence-backed answers.
Editorial comparison. The goal is not to embarrass founders; it is to make weak claims easier to improve before serious diligence.

6. Who Owns The AI After It Ships?

Ask who owns model changes, eval regressions, security incidents, customer escalations, and compliance responses. If ownership is split across product, engineering, data science, legal, and customer success with no final accountable person, the system may work technically and still fail operationally.

7. What Is The Most Embarrassing Failure Mode?

Good teams can name their failure modes. They know where the system hallucinates, where retrieval is thin, where customers misuse it, where latency spikes, where human review is expensive, and where their claims are not yet proven.

I trust a founder more when they can say, "This is where the product is still brittle." It means they have looked.

The Real Test

An AI company does not need a perfect answer to every question. Stage matters. A seed company should not pretend to have late-stage controls. But the team should know which evidence exists, which evidence is missing, and which claims are too early.

The real test is whether the founder understands the system they are building. Not the model. The system.

References

  1. NIST AI Risk Management Framework.
  2. OWASP Top 10 for LLM Applications 2025.
  3. MLCommons AILuminate.
  4. SEC AI-washing enforcement release.
  5. ISO/IEC 42001 AI management systems.
  6. Artificial Analysis model comparisons.