Written by

Dimitri Yatsenko, PhD

Founder • Chief Science & Technology Officer

Get Started with DataJoint

June 23, 2017

DataJoint ERDs are DAGs

Dimitri Yatsenko, PhD

Founder • Chief Science & Technology Officer

Have you noticed that DataJoint’s ERDs (entity-relationship diagrams) form directed acyclic graphs (DAGs)? For example, the following ERD depicts the preprocessing pipeline for two-photon imaging data in Andreas Tolias’ Lab (the code is at https://github.com/cajal/pipeline).

reso-erd — *An Entity-Relationship Diagram of a schema for processing two-photon imaging data from a resonant-scanning microscope.*

In this diagram, all the dependencies are directed downward. Every edge is a foreign key from the downstream node to the upstream one. Yes, it’s important to note that the arrows depict the direction of dependency, opposite to the direction of the foreign key.

Thus the ERD has no loops. This make sense if you keep in mind that DataJoint is designed to support data pipelines, i.e. sequences of steps to perform in the course of a study from data acquisition to processing to analysis.

An investigator recently asked me whether DataJoint’s commitment to acyclic dependencies is a limitation of its representational power. After all, conventional E-R designs do not have a consistent direction and can form cycles. Textbooks on database design often feature tables with foreign keys into themselves.

For example, Panel A of the following figure depicts a textbook example of a cyclic relationship. A member of the Employee class may optionally have a manager who is also an Employee. This common design is often translated into a relational design with a table with a nullable foreign key referencing itself.

employee-subordinate — A) A textbook cyclic relationship: an Employee may be managed by another Employee.
B) The same relationship refactored without cycles by adding the new entity Subordinate.
C) An equivalent DataJoint ERD.

However, the same relationship can be expressed with an acyclic design (Panel B) by introducing a new entity class Subordinate with two relationships to Employee: is a and reports to. This design would translate into two tables: Employee with no foreign keys and Subordinate with two foreign keys into Employee. The first foreign key is defining: it forms the primary key of Subordinate. The second foreign key is made from dependent attributes. Panel C depicts the DataJoint ERD for this design.

The acyclic design has multiple advantages. The foreign keys are no longer nullable: if an employee does not report to anyone, her entry is excluded from Subordinate altogether. The data become easier to enter, modify, and delete. For example, employees can be entered in any order followed by entering of the reporting relationships. Deleting a subset of employees becomes straightforward with one step of cascading delete. With a self-referencing employee table all these operations become problematic.

The Python code defining these two DataJoint classes would be as follows:

Python code for the Employee/Subordinate relationship

@schema

class Employee(dj.Manual):

definition = """ # company employee

emp_id : int # employee id within the company

---

fullname : varchar(120)

date_of_birth : date

hire_date : date

-> Department

"""

@schema

class Subordinate(dj.Manual):

definition = """ # employee who reports to a manager

-> Employee

---

(reports_to) -> Employee

"""

Any ER design with a cyclic network of relationships can be refactored as a directed acyclic graph.

The synaptic connectivity example from yesterday’s post provides another example of transforming a cyclic relationship into an acyclic one.

The directed acyclic nature of DataJoint’s pipelines improves their interpretability and predictable appearance and enables more consistent internal handling of dependencies (e.g. in cascading deletes). The downward flow of dependencies suggests possible workflows: the data on top of the pipeline is populated first and the next steps are inferred from the graph.

Insights & Ideas

September 26, 2025

Neuropixels, Plainly Explained

What you’ll learn: which probe fits your study, the open tools you need, and how to keep results trustworthy.

Milagros Marín, PhD

SciOps Engineer

Insights & Ideas

August 26, 2025

AI and the Evolution of Relational Schemas

Can AI thrive without structure? Why relational schemas are essential for AI to produce reliable, trustworthy results - and need not be rigid.

Dimitri Yatsenko, PhD

Founder • Chief Science & Technology Officer

Insights & Ideas

August 21, 2025

Insight Entrepreneurship – A New Vision for Science

What if scientists were also stewards, strategists, and storytellers of knowledge? Introducing a new model for revitalizing science through ownership, integrity, and impact.

Dimitri Yatsenko, PhD

Founder • Chief Science & Technology Officer

Updates Delivered Straight to Your Inbox

Join the mailing list for industry insights, company news, and product updates delivered monthly.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Scientists in full protective suits and gloves working in a laboratory, one looking into a microscope and taking notes, the other using a computer.

DataJoint ERDs are DAGs

Neuropixels, Plainly Explained

AI and the Evolution of Relational Schemas

Insight Entrepreneurship – A New Vision for Science

Updates Delivered *Straight to Your Inbox*

Updates Delivered Straight to Your Inbox