Skip to main content
ServicesTechnical Due Diligence

AI-specific due diligence — model risk, data rights, vendor lock-in, demo vs. production gap.

The demo vs. production gap is the most common AI due diligence failure. A system performs impressively against vendor-curated test cases and poorly against your actual inputs. We test claimed capabilities against inputs representative of your use case, audit MLOps maturity, assess training data provenance and data rights, and evaluate vendor dependency risk — before you close.

Technical Due Diligence
The Problem

AI system due diligence has failure modes that general software due diligence does not. A system can have clean code, strong test coverage, and well-documented infrastructure — and still have AI capabilities that significantly underperform their claimed benchmarks on real-world inputs. Vendor-provided benchmark results are measured on evaluation datasets chosen to show the system favorably. Independent testing against inputs representative of the acquiring party's use case almost always produces different results.

AI-specific technical debt is invisible to code review focused on application quality. MLOps debt — training pipelines that cannot be reproduced, models without lineage, evaluation frameworks that do not reflect production conditions — affects system quality and improvability in ways that application code review does not surface. Data rights issues — training data with unclear licensing, scraping that violated terms of service, or PII in training datasets — are legal and reputational risks that require explicit investigation.

AI due diligence dimensions that code review alone misses
  • Capability testing against your representative inputs — not vendor-curated benchmarks
  • MLOps maturity: can the model be retrained, updated, and rolled back reliably?
  • Training data provenance and rights: can the dataset be audited for licensing and compliance?
  • Evaluation methodology quality: does the offline evaluation predict production performance?
  • Vendor dependency risk: what happens if a model API is deprecated or repriced?
  • The demo vs. production gap: does the system work on real user inputs, not just curated test cases?
Our Approach

We conduct AI technical due diligence in four layers: capability assessment (does the system do what it claims on your inputs?), code and infrastructure quality (is it maintainable and scalable?), AI-specific technical debt (MLOps maturity, data lineage, evaluation quality, data rights), and risk assessment (vendor lock-in, integration risk, operational risk at scale).

Capability assessment is conducted using your specific test cases, not vendor-provided benchmarks. We design a test dataset representative of your intended use and run the system against it, measuring the metrics that matter for your use case. This is the only reliable basis for acquisition decisions — vendor benchmarks are systematically optimistic.

Due diligence engagement structure

01
Scope definition and access requirements

Define the system components in scope, the acquisition or investment thesis, and the specific capability claims to test. Establish access requirements: API access, code repository, infrastructure documentation, data documentation, interview time with technical leads.

02
Independent capability testing

Design and execute tests using inputs representative of your use case. Document performance against your test cases and compare to claimed benchmarks. Map the demo vs. production gap explicitly.

03
Infrastructure and code audit

Review system architecture, code quality, test coverage, deployment processes, and operational procedures. Assess scalability and identify infrastructure risks at target scale.

04
AI-specific debt and data rights assessment

Audit MLOps maturity: experiment tracking, model registry, retraining pipeline, monitoring. Audit training data provenance, annotation quality, and licensing. Identify data rights risks.

05
Risk register and findings report

Prioritized findings report separating deal-breaker issues from negotiation-relevant items. Vendor dependency risk, integration risks, and operational cost model at target scale.

What Is Included
01

Independent capability testing against your inputs

We test AI capabilities against inputs representative of your use case — not vendor-provided benchmarks. This produces an honest measure of what you are acquiring, independent of how the vendor chose to present their system. The demo vs. production gap is quantified explicitly.

02

MLOps maturity assessment

Can models be retrained reliably? Are experiments reproducible? Is there a model registry with documented promotion criteria? Is production monitoring in place? MLOps debt is expensive to retrofit and determines how quickly the system can improve after acquisition.

03

Training data provenance and rights audit

We audit training data provenance, annotation quality documentation, and compliance posture. Unlicensed training data, scraping that violated terms of service, or PII in training datasets are legal and operational risks that need to surface before close.

04

Vendor dependency and lock-in analysis

We identify which capabilities depend on specific model APIs (OpenAI, Anthropic, Google), assess the risk of deprecation or pricing changes, and evaluate the portability of the system to alternative providers. Abstraction layer quality determines how expensive migration would be.

05

Infrastructure scalability analysis

We model infrastructure cost and performance at target scale. Systems that work at current load may have architecture bottlenecks or cost structures that do not scale to the acquiring party's volume requirements.

Deliverables
  • Independent capability assessment report with testing results against representative inputs
  • Demo vs. production gap analysis
  • Code and infrastructure quality assessment with scalability risk identification
  • MLOps maturity assessment: experiment tracking, model registry, retraining, monitoring
  • Training data provenance and data rights audit
  • Vendor dependency and lock-in risk assessment
  • Risk register with deal-breaker issues and negotiation-relevant findings
Projected Impact

Technical due diligence that independently tests AI capabilities and surfaces MLOps debt, data rights issues, and vendor lock-in risk provides the information that should inform acquisition pricing, integration planning, and post-acquisition roadmap. The cost of a due diligence engagement is small relative to discovering these issues after close.

FAQ

Common questions about this service.

What is the demo vs. production gap and how do you measure it?

The demo vs. production gap is the difference between how a system performs on vendor-curated test cases and how it performs on real user inputs that the vendor did not select. We measure it by designing a test dataset representative of your intended use — based on your user base, query distribution, and edge cases — and running the system against it. The gap is quantified in the same metrics the vendor used for their benchmark claims.

What are the most common deal-breaker findings in AI due diligence?

We flag these explicitly: capability that significantly underperforms claimed benchmarks on representative inputs, training data with licensing or compliance issues, no model retraining capability (the system cannot improve or be updated), critical security vulnerabilities including exposed training data, or cost models that are not viable at required scale. These are reported separately from negotiation-relevant findings.

Can you assess both API-based AI systems and custom-trained models?

Yes, with different emphases. API-based systems: prompt engineering quality, output handling, vendor dependency risk, and cost model at scale. Custom-trained models add: MLOps maturity, training data quality, evaluation methodology quality, and model portability. We tailor the assessment to the architecture.

What access do we need from the target company?

At minimum: system documentation, architecture diagrams, and API access for capability testing. For full due diligence: code repository access (read-only), infrastructure documentation for cost analysis, data documentation, and interview time with technical leads. We scope the engagement based on available access under the NDA structure in place.

Ready to get started?

Tell us what you are building. We will scope it, price it honestly, and give you a clear plan.

Start a Conversation

Free 30-minute scoping call. No obligation.