Why reinvent the relational model?
An introductory course in Database Systems will likely present two closely related data models: the Entity-Relationship Model (ERM) and the Relational Data Model (RDM). The ERM is useful for conceptual modeling of real-world entities, their attributes, and relationships between entities of different classes. The RDM supports logical modeling suitable for implementation. Much of the course will focus on how to convert ERM designs into the RDM; such conversion remains an art even though automated tools have been proposed. Furthermore, the process is irreversible: an ERM design cannot be straightforwardly recovered from its RDM counterpart.

Database programmers rarely bother with a formal ERM design. With experience they learn to model entities and relationships in their heads and churn out SQL table declarations. Just like a Zion operator from the Matrix movies perceives the state of the Matrix from the code raining on her green screen, database programmers infer the underlying conceptual design from existing table declarations and foreign key constraints defined by others. Tools for reverse-engineering database schemas do not quite recover the entity-relationship design but help visualize the structure of tables and foreign key constraints to help infer it.
Why do we need two data models to design one database? Why not have a single data model that can be used for both conceptual modeling and for implementation? Why is the ERM not suitable for logical modeling and the RDM is a poor conceptual model?
I will speculate that part of the problem lies in the chronology of the two inventions. The RDM was defined in 1969 (by Edgar F. Codd) whereas the ERM did not appear until much later, in 1976 (Peter Chen). The RDM was inspired by the mathematical concept of relations from set theory. A relation is defined as a subset of the Cartesian product of several sets (domains). Although his descriptions implied that relations corresponded to sets of real-world entities of various types, Codd formulated his model in much more general and abstract terms. By the time the ERM was described, relational concepts were already firmly ingrained.
I will further speculate that had the chronology been reversed and had the relational model been constrained by E-R concepts, many of its core definitions and operations would have turned out quite different. Perhaps we would have a relational-like model that kept its focus on modeled entities and their relationships. Then perhaps this model would suit the needs of both conceptual and logical design. Furthermore, abstract and arcane concepts such as functional dependencies and normal forms would be formulated in much more approachable terms such as proper delineation of entities.
The core idea of DataJoint is to reformulate the Relational Data Model to prioritize its effectiveness in the role of a Entity-Relationship Model. The resulting data model should obviate the need for two separate processes, or, since ERM is rare in practice, greatly improve the conceptual aspects of the relational data model.
This unification of conceptual and logical modeling required major revisions of many established concepts in traditional database design. Since SQL has long become the lingua franca of relational databases, we will often contrast how solutions in DataJoint differ from those in SQL. Most DataJoint users learn database programming without ever touching SQL. Even for them, such examples may still help clarify basic concepts. For users who already know SQL and relational concepts from other sources, the examples will help map their knowledge to DataJoint.
Related posts
Entrepreneurs of Insight
A Better Data Engine for Brain Science
Data needs direction: five clarifications for database design
Updates Delivered *Straight to Your Inbox*
Join the mailing list for industry insights, company news, and product updates delivered monthly.
