Written by
Dimitri Yatsenko, PhD
Founder • Chief Science & Technology Officer
July 31, 2017

Strong Entity Integrity: Part 3 — Restriction

Dimitri Yatsenko, PhD
Founder • Chief Science & Technology Officer

We use the term restriction to refer to the selection of a subset of an entity set. If database terms, restriction selects a subset of rows from a table whereas projection selects columns. We avoided the term selection because in SQL queres, confusingly, the SELECT clause performs projection whereas the WHERE clause performs selection (restriction).

Since restriction does not affect the heading of its argument, restriction may seem as the most benign operation when it comes to entity integrity. It stands to reason that, if r is a set of hammers, then its subset

\sigma_{\text{color}=\text{`red'}}(r)

or the SQL query

SELECT * FROM r

WHERE color='red'

will also represent hammers.

In DataJoint, the equivalent expression is

r & 'color="red"'

and its primary key is the same as r‘s.

Is this reasoning bullet proof?

What if we restrict by an equality condition on a primary key attribute?

For example, if the primary key of r is (a,b), then r & 'a=5' will yield a relation where every tuple can be uniquely identified by b. From the purely relational point of view, b becomes the primary key of the result, i.e. the minimal subset of attributes uniquely identifying each tuple. a is no longer necessary at all since its value is known from the restriction.

This is where DataJoint makes a compromise in favor of the ERM and entity integrity: DataJoint keeps the original primary key even if it’s redundant — for the sake of entity integrity. DataJoint keeps a in the primary key of the result, which means that it cannot be projected out in subsequent operations (see Part 2 of this series).

The SQL equivalent of DataJoint’s (r & 'a=5').proj('c') if the primary key of r is (a, b) is

SELECT a, b, c

  FROM r

  WHERE a=5

In SQL you might prefer to omit the a since we do not need to retrieve its value:

SELECT b, c

  FROM r

  WHERE a=5

Although potentially more efficient, the SQL result has lost its entity class. Or put in another way, this query introduces a new unnamed implicit entity class with the primary key of b.

DataJoint’s mantra is “preserve entity integrity” even if it costs some efficiency (usually very little). True, it is possible that an experienced programmer can re-write a DataJoint query more efficiently in SQL. But the gain in clarity and logical coherence is well worth the cost. DataJoint enforces strong entity integrity for the sake of the human scientist and not to help the query compiler be more efficient. DataJoint’s query model ensures that new entities are not introduced implicitly willy-nilly mid-sentence in queries. The only way to introduce a new entity class is to explicitly define it.

Related posts

Updates Delivered *Straight to Your Inbox*

Join the mailing list for industry insights, company news, and product updates delivered monthly.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.