Engineering Principles Scientific Practice

The vision, team, and story behind DataJoint

What if complexity could *fuel discovery*, instead of breaking it?

Today’s labs are built for improvisation, not scale. Most rely on fragmented tools, poorly documented workflows, and manual processes strung together with brittle scripts and ad hoc tools. They function—until overwhelmed by complexity.

And complexity is rising fast. We’ve stretched loosely managed systems to the breaking point. The result? A crisis of replication and waste, and progress that’s incremental at best.

At the same time, the potential for acceleration has never been greater. AI and automation promise to transform the speed and depth of discovery. But AI will only amplify the mess if its inputs are not reliable, structured, and context-rich. 

Science has hit a complexity ceiling. DataJoint exists to break through.

“The common goal … is to accelerate scientific knowledge generation, potentially by orders of magnitude, while achieving greater control and reproducibility in the scientific process.”

Automated Research Workflows for Accelerated Discovery

The *Computational Database*

Experimental science requires both reproducibility and flexibility.

A computational result that cannot be recreated is not valid science. And reproducing a given result requires linking it with the raw data, metadata, code, parameters, and sequence of transformations that produced it.

But experiments are always changing: code, parameters, algorithms, instruments, processes. Every change threatens to sever one of those critical links. Given the complexity, the conditions for reproducibility are rarely met in studies of any scale.

DataJoint’s core innovation is a database model that delivers flexibility without sacrificing integrity and reproducibility. 

Our solution, the computational database, is fundamental infrastructure for reproducible science. It unifies every aspect of a study – data, code, and workflows – and manages computation and change. It makes scientific processes flexible, repeatable, and ready for next-generation AI.

The *SciOps Discipline*

The computational database provides the infrastructure. SciOps defines the discipline.

SciOps brings structure and operational rigor to every stage of the research process via technology-enabled methodologies that foster a high level of operational maturity.

SciOps replaces disconnected tools and manual handoffs with a continuously running system for research. It demands an integrated approach to scientific work: modular workflows, automated quality control, versioned code and process, and real-time collaboration around shared pipelines.

DataJoint is helping define the SciOps discipline, co-leading, with Johns Hopkins Applied Physics Lab, an alliance of academic and industry partners. Email us to learn more about SciOps or the Alliance.

“SciOps is a methodology that unifies experimental design, data collection, processing, analysis, and dissemination into a seamless, repeatable pipeline that enhances efficiency, reproducibility, and scalability in scientific research.”

Erik Johnson
et al., SciOps: A New Operational Model for Reproducible Science (2024) (under review by Nature Methods)
Before
Level 1

Initial

Ad Hoc Processes
Ad Hoc Processes
DIY Custom Development
Results with
Level 2

Managed

Established Processes
Repeatable Processes
Role Specialization
Quality Control
Level 3

Defined

Sharable Processes
Open-Source Ecosystems
FAIR Data
FAIR Workflows
Level 4

Scalable

Automated Workflows
SciOps Pipeline
Collaborative Environments
Teamflow
Level 5

Optimizing

Closed-loop Discovery
AI + Human in the Loop
Before
Results with

SciOps Core Principles

Modularity
Automation
Transparency
Traceability
Continuous Improvement
Modularity
Automation
Transparency
Traceability
Continuous Improvement

The *History*

Built for scientists. Proven at scale. Open by design.

DataJoint’s story began with a scientist: Dimitri Yatsenko, an expert in data architecture and systems engineering who set aside a successful career to study the brain. The neuroscience lab presented a too-common scene: cutting-edge experiments with fragile workflows, burdensome manual processes, and a lack of rigor. So, he invented a new type of system – a computational database – and released it as an open-source project called DataJoint.

DataJoint quickly gained traction in high-stakes, high-complexity research – such as the landmark MICrONS study recently published in Nature. It has enabled dozens of labs to collaborate, process petabytes of data, and push the limits of what’s scientifically possible.

In 2020, NIH stepped in to amplify DataJoint’s reach, funding our evolution from a DIY system used primarily on big-budget projects with significant engineering capabilities into an accessible commercial platform within reach of every lab. 

Today, DataJoint’s operating platform has been adopted by leading labs across systems neuroscience, pathology, and rehabilitation. And while the platform has grown in capability and support, its foundation remains open: DataJoint Python gives labs a common language to describe their data, code, and computational workflows. Anyone can read, understand, and extend your pipeline. And you can take your data with you.

Trusted by leaders in data-intensive science

Your Partner in *SciOps Transformation*

Equipping your lab for the next level of performance.

We’re a team of world-class experts in life sciences, scientific computing, data engineering, and research operations. We’re here to support your scientific goals, contributing systems, practices, and expertise developed in leading research environments around the world to help you level up your capabilities without disrupting the science.

Life Sciences

Multi-modality investigation of biological systems – neuroscience, behavior, oncology, -omics, kinematics, and more.

Computer Science

Reproducible pipelines, automated processing, end-to-end governance of the data supply chain.

Data Science

Signal extraction, quality control, computational methods, AI/ML analysis.

Operations

 ("Ops") for maximum efficiency, scalability, and reduced errors.

Meet the team that makes it happen.

Dimitri Yatsenko, PhD

Founder • Chief Science & Technology Officer

BS MS Computer Science - Utah St • MS Computer Engineering - U Utah • PhD Neuroscience - Baylor College of Medicine

Builds the foundations of scalable, reproducible science

Jim Olson

Chief Executive Officer

BA Mathematics - Hamline University • MA - Luther Seminary

Leads execution, scales teams, delivers transformation

Monty Kosma, JD

Co-Founder • Chief Marketing Officer

BS Physics - Harvey Mudd • JD - University of Chicago

Articulates what's next for science -- and how we get there

Thinh Nguyen, PhD

SciOps Lead

PhD - Biomedical Engineering - University of Houston

Builds data platforms to transform Scientific Operations with AI and automation

Timothy Reiland

Director of Operations and Finance

BA Psychology Rice University

Manages finances, operations, and grant compliance

Corprew Reed

Engineering Lead

AB - History - University of Chicago

Organizes information. Scales up. Builds platforms.

Milagros Marín, PhD

SciOps Engineer

PhD Biochemistry and Molecular Biology - University of Granada

Builds AI-powered data platforms with scalable, automated workflows to accelerate scientific discovery.

Kushal Bakshi, PhD

SciOps Engineer

PhD Neuroscience - Texas A&M University

Drives platform adoption through deployment, training, and support

Talk to us about high-impact science.

See how DataJoint can help your lab move faster, stay organized, and advance your scientific aims — with less effort and overhead.