Written by
Dimitri Yatsenko, PhD
Founder • Chief Science & Technology Officer
June 21, 2017

An algebra of entity sets

Dimitri Yatsenko, PhD
Founder • Chief Science & Technology Officer

The DataJoint model is the synthesis of the relational data model and the entity-relationship model.  It preserves the logical rigor of the relational model while preserving the conceptual clarity of the E-R model.  DataJoint is both models rolled into one: the basic units of DataJoint pipeline are entity classes that are also relation variables.

The E-R model and the relational data model both concern the structure of the database.  However, the relational model is also suitable for data queries, providing two distinct paradigms: relational algebra and relational calculus.  The latter became the foundation for SQL.

DataJoint unifies the two data models for data queries too.  It extends E-R concepts into its query language.  Its query language most closely resembles relational algebra but since it preserves entity integrity in all its operations, it can also be thought of as an algebra of entity sets or an entity set algebra.

DataJoint’s operators correspond to similar operations of relational algebra but are modified and restricted in ways that ensure that both the inputs and the outputs are meaningful entity sets.  Derived relation variables resulting from DataJoint expressions may be thought of as new entity classes with defining foreign keys into their input classes.  For example, the expression

experiment.Mouse & stimulus.Trial

can be thought of as a new table with a defining foreign key into experiment.Mouse whereas

experiment.Mouse * stimulus.Trial

is thought of as a new table with defining foreign keys into both experiment.Mouse and stimulus.Trial.

DataJoint’s restriction, projection, and aggregation operators preserve the same primary key as their argument and correspond to the original entity.   They can be thought of as new computed tables.  For this reason, DataJoint’s projection operator cannot project out the primary key attributes.

DataJoint’s aggr operator plays the role of the GROUP BY clause in SQL or the aggregation operation in relational algebra.  Unlike its counterparts, DataJoint’s aggr operator aggregates on entity set with respect to the other rather than group by an arbitrary set of attributes.

The same principle of preserving entity integrity permeates through all operators and is behind DataJoint’s lucidity. Therefore, to differentiate DataJoint’s query language from relational algebra, we can refer to it as an algebra of entity sets and entity set operators.

Related posts

Updates Delivered *Straight to Your Inbox*

Join the mailing list for industry insights, company news, and product updates delivered monthly.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.