Strong Entity Integrity: Part 3 — Restriction
We use the term restriction to refer to the selection of a subset of an entity set. If database terms, restriction selects a subset of rows from a table whereas projection selects columns. We avoided the term selection because in SQL queres, confusingly, the SELECT clause performs projection whereas the WHERE clause performs selection (restriction).
Since restriction does not affect the heading of its argument, restriction may seem as the most benign operation when it comes to entity integrity. It stands to reason that, if r is a set of hammers, then its subset

or the SQL query
SELECT
*
FROM
r
WHERE
color='red'
will also represent hammers.
In DataJoint, the equivalent expression is
r & 'color="red"'
and its primary key is the same as r‘s.
Is this reasoning bullet proof?
What if we restrict by an equality condition on a primary key attribute?
For example, if the primary key of r is (a,b), then r & 'a=5' will yield a relation where every tuple can be uniquely identified by b. From the purely relational point of view, b becomes the primary key of the result, i.e. the minimal subset of attributes uniquely identifying each tuple. a is no longer necessary at all since its value is known from the restriction.
This is where DataJoint makes a compromise in favor of the ERM and entity integrity: DataJoint keeps the original primary key even if it’s redundant — for the sake of entity integrity. DataJoint keeps a in the primary key of the result, which means that it cannot be projected out in subsequent operations (see Part 2 of this series).
The SQL equivalent of DataJoint’s (r & 'a=5').proj('c') if the primary key of r is (a, b) is
SELECT
a, b, c
FROM
r
WHERE
a=5
In SQL you might prefer to omit the a since we do not need to retrieve its value:
SELECT
b, c
FROM
r
WHERE
a=5
Although potentially more efficient, the SQL result has lost its entity class. Or put in another way, this query introduces a new unnamed implicit entity class with the primary key of b.
DataJoint’s mantra is “preserve entity integrity” even if it costs some efficiency (usually very little). True, it is possible that an experienced programmer can re-write a DataJoint query more efficiently in SQL. But the gain in clarity and logical coherence is well worth the cost. DataJoint enforces strong entity integrity for the sake of the human scientist and not to help the query compiler be more efficient. DataJoint’s query model ensures that new entities are not introduced implicitly willy-nilly mid-sentence in queries. The only way to introduce a new entity class is to explicitly define it.
Related posts
Entrepreneurs of Insight
A Better Data Engine for Brain Science
Data needs direction: five clarifications for database design
Updates Delivered *Straight to Your Inbox*
Join the mailing list for industry insights, company news, and product updates delivered monthly.
