DataJoint ERDs are DAGs
Have you noticed that DataJoint’s ERDs (entity-relationship diagrams) form directed acyclic graphs (DAGs)? For example, the following ERD depicts the preprocessing pipeline for two-photon imaging data in Andreas Tolias’ Lab (the code is at https://github.com/cajal/pipeline).

In this diagram, all the dependencies are directed downward. Every edge is a foreign key from the downstream node to the upstream one. Yes, it’s important to note that the arrows depict the direction of dependency, opposite to the direction of the foreign key.
Thus the ERD has no loops. This make sense if you keep in mind that DataJoint is designed to support data pipelines, i.e. sequences of steps to perform in the course of a study from data acquisition to processing to analysis.
An investigator recently asked me whether DataJoint’s commitment to acyclic dependencies is a limitation of its representational power. After all, conventional E-R designs do not have a consistent direction and can form cycles. Textbooks on database design often feature tables with foreign keys into themselves.
For example, Panel A of the following figure depicts a textbook example of a cyclic relationship. A member of the Employee class may optionally have a manager who is also an Employee. This common design is often translated into a relational design with a table with a nullable foreign key referencing itself.

B) The same relationship refactored without cycles by adding the new entity Subordinate.
C) An equivalent DataJoint ERD.
However, the same relationship can be expressed with an acyclic design (Panel B) by introducing a new entity class Subordinate with two relationships to Employee: is a and reports to. This design would translate into two tables: Employee with no foreign keys and Subordinate with two foreign keys into Employee. The first foreign key is defining: it forms the primary key of Subordinate. The second foreign key is made from dependent attributes. Panel C depicts the DataJoint ERD for this design.
The acyclic design has multiple advantages. The foreign keys are no longer nullable: if an employee does not report to anyone, her entry is excluded from Subordinate altogether. The data become easier to enter, modify, and delete. For example, employees can be entered in any order followed by entering of the reporting relationships. Deleting a subset of employees becomes straightforward with one step of cascading delete. With a self-referencing employee table all these operations become problematic.
The Python code defining these two DataJoint classes would be as follows:
Python code for the Employee/Subordinate relationship
@schema
class
Employee(dj.Manual):
definition
=
""" # company employee
emp_id : int # employee id within the company
---
fullname : varchar(120)
date_of_birth : date
hire_date : date
-> Department
"""
@schema
class
Subordinate(dj.Manual):
definition
=
""" # employee who reports to a manager
-> Employee
---
(reports_to) -> Employee
"""
Any ER design with a cyclic network of relationships can be refactored as a directed acyclic graph.
The synaptic connectivity example from yesterday’s post provides another example of transforming a cyclic relationship into an acyclic one.
The directed acyclic nature of DataJoint’s pipelines improves their interpretability and predictable appearance and enables more consistent internal handling of dependencies (e.g. in cascading deletes). The downward flow of dependencies suggests possible workflows: the data on top of the pipeline is populated first and the next steps are inferred from the graph.
Related posts
Entrepreneurs of Insight
A Better Data Engine for Brain Science
Data needs direction: five clarifications for database design
Updates Delivered *Straight to Your Inbox*
Join the mailing list for industry insights, company news, and product updates delivered monthly.
