Written by
Dimitri Yatsenko, PhD
Founder • Chief Science & Technology Officer
June 23, 2017

DataJoint ERDs are DAGs

Dimitri Yatsenko, PhD
Founder • Chief Science & Technology Officer

Have you noticed that DataJoint’s ERDs (entity-relationship diagrams) form directed acyclic graphs (DAGs)?  For example, the following ERD depicts the preprocessing pipeline for two-photon imaging data in Andreas Tolias’ Lab (the code is at https://github.com/cajal/pipeline).

reso-erd
An Entity-Relationship Diagram of a schema for processing two-photon imaging data from a resonant-scanning microscope.

In this diagram, all the dependencies are directed downward.  Every edge is a foreign key from the downstream node to the upstream one.  Yes, it’s important to note that the arrows depict the direction of dependency, opposite to the direction of the foreign key.

Thus the ERD has no loops.  This make sense if you keep in mind that DataJoint is designed to support data pipelines, i.e. sequences of steps to perform in the course of a study from data acquisition to processing to analysis.

An investigator recently asked me whether DataJoint’s commitment to acyclic dependencies is a limitation of its representational power.  After all, conventional E-R designs do not have a consistent direction and can form cycles.  Textbooks on database design often feature tables with foreign keys into themselves.

For example, Panel A of the following figure depicts a textbook example of a cyclic relationship.  A member of the Employee class may optionally have a manager who is also an Employee.  This common design is often translated into a relational design with a table with a nullable foreign key referencing itself.

employee-subordinate
A) A textbook cyclic relationship: an Employee may be managed by another Employee.
B) The same relationship refactored without cycles by adding the new entity Subordinate.
C) An equivalent DataJoint ERD.

However, the same relationship can be expressed with an acyclic design (Panel B) by introducing a new entity class Subordinate with two relationships to Employee: is a and reports to. This design would translate into two tables: Employee with no foreign keys and Subordinate with two foreign keys into Employee. The first foreign key is defining: it forms the primary key of Subordinate. The second foreign key is made from dependent attributes. Panel C depicts the DataJoint ERD for this design.

The acyclic design has multiple advantages. The foreign keys are no longer nullable: if an employee does not report to anyone, her entry is excluded from Subordinate altogether.  The data become easier to enter, modify, and delete. For example, employees can be entered in any order followed by entering of the reporting relationships. Deleting a subset of employees becomes straightforward with one step of cascading delete. With a self-referencing employee table all these operations become problematic.

The Python code defining these two DataJoint classes would be as follows:

Python code for the Employee/Subordinate relationship

@schema

class Employee(dj.Manual):

   definition = """ # company employee

   emp_id : int # employee id within the company

   ---

   fullname : varchar(120)

   date_of_birth : date

   hire_date : date

   -> Department

   """

@schema

class Subordinate(dj.Manual):

   definition = """  # employee who reports to a manager

   -> Employee

   ---

   (reports_to) -> Employee

   """

Any ER design with a cyclic network of relationships can be refactored as a directed acyclic graph.

The synaptic connectivity example from yesterday’s post provides another example of transforming a cyclic relationship into an acyclic one.

The directed acyclic nature of DataJoint’s pipelines improves their interpretability and predictable appearance and enables more consistent internal handling of dependencies (e.g. in cascading deletes). The downward flow of dependencies suggests possible workflows: the data on top of the pipeline is populated first and the next steps are inferred from the graph.

Related posts

Updates Delivered *Straight to Your Inbox*

Join the mailing list for industry insights, company news, and product updates delivered monthly.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.