When Scientific AI Forgets How It Got There

Douglas Welsh

Chief Revenue Officer

Pharma R&D has spent the last three years quietly making AI load-bearing. Models now propose targets, score compounds, design assays, summarize literature, and increasingly draft sections of regulatory documents. The conversation has been about capability — what AI can do. The conversation we are not having, and need to, is about provenance — what the institution can prove about how it did it.

The argument is narrow and specific: scientific AI without deterministic provenance is not just messy. It is institutionally dangerous.

By deterministic provenance, I mean the ability to point at any AI-derived result — a hit list, a predicted structure, an annotated cohort, a generated experimental protocol — and reconstruct, bit-for-bit, the data, code, model version, parameters, random seeds, and upstream transformations that produced it. Not "logged somewhere." Not "the analyst remembers." Reproducible on demand, by someone other than the person who first ran it.

Most pharma AI today does not meet this bar. Notebooks reference data files whose schemas have since drifted. Foundation models are version-pinned on paper but rarely in practice. Pipelines stitch together SaaS APIs, internal scripts, and curated spreadsheets. The output looks crisp; the lineage is a smear.

This is fine when AI is a brainstorming partner. It becomes dangerous the moment AI output enters the institutional record — IND-enabling packages, IP filings, partnership data rooms, internal go/no-go decisions. Three failure modes follow.

First, regulatory exposure. FDA and EMA expectations are converging on reconstructable analyses. "Our model said so" is not a defense if the model, weights, and inputs cannot be reproduced. Sponsors who accelerate with AI but cannot defend the chain of derivation are accumulating a regulatory debt that comes due during inspection, not before.

Second, audit blindness when programs fail. Most clinical and preclinical programs fail. The institutional value of a failed program is the post-mortem — what did we believe, why did we believe it, where did the evidence break. When AI-derived intermediates cannot be reconstructed, the post-mortem cannot be performed honestly. The organization loses the ability to learn from its own failures, which is the most expensive form of institutional damage.

Third, decision velocity outrunning evidence integrity. AI compresses cycle times, which is the point. But the same compression means provenance debt accrues faster than humans can repair it. A team can make twelve AI-assisted decisions in the time it used to make one, with one-twelfth the lineage discipline. The danger is not that any one decision is wrong. It is that the organization can no longer tell which decisions rest on which evidence.

The fix is not more logging. Logging is observational and lossy. Deterministic provenance has to be a property of the system that produces results, not a record kept alongside it — data, code, and computation tracked in the same structure, with versioning and lineage that are queryable and reproducible. Designed in, not bolted on.

Leaders evaluating AI tooling should ask a single question of every vendor and every internal team: if a regulator, a partner, or a future post-mortem asks how this result was produced, can we rebuild it from if a regulator, a partner, or afuture post-mortem asks how this result was produced, can we rebuild it from first inputs, without depending on the person who ran it? If the honest answer is no, the institution is not adopting AI. It is borrowing against its own credibility.

The organizations that win the next decade of scientific AI will not be the ones with the largest models. They will be the ones whose AI outputs are still defensible three years after the analyst has left.

This is the conviction behind how we built the DataJoint platform: provenance as a first-class property of scientific computation, not alogbook kept beside it.

‍