Optional dependencies
When a foreign key is a secondary attribute, its value may made optional in one of three ways.
Let’s say that an attribute of the entity set Student
is the student’s major and the valid majors are in Major
. We need to allow for students with and without majors. Here are the three ways to define this relationship with their respective strengths and weaknesses.
Solution 1: Make the foreign key nullable
Definition of nullable foreign keys is described in docs.datajoint.io. However, this feature is only scheduled for release in the upcoming datajoint-python version 0.11.
The definitions of the two tables would appear as follows:
:: Major
major : char(8) # abberviated name, e.g. BIOL, PHYS, NEUROSCI
---
major_name : varchar(255) # full major name
-> Department
:: Student
student_id : int
---
full_name : varchar(255) # e.g. DOE, Jane S
date_of_birth : date
sex : enum('M','F','U')
-> [nullable] Major
Solution 2: A separate association table
If an attribute or attributes are optional, sometimes it makes more sense to put them in a separate dependent table. For the example above, we would leave out the major from the student table and add a new table StudentMajor
to associate a student with a major:
:: Student
student_id int
---
full_name : varchar(255) # e.g. DOE, Jane S
date_of_birth : date
sex : enum('M','F','U')
:: StudentMajor
-> Student
---
-> Major
Solution 3: Special value
Finally, one could keep the dependency required but add a special value to Major
to indicate missing major, for example, ‘NONE’ or ‘UNDECL’.
Then Student
will be declared as
:: Student
student_id : int
---
full_name : varchar(255) # e.g. DOE, Jane S
date_of_birth : date
sex : enum('M','F','U')
-> Major
Which solution is best?
Right away, let’s state that Solution 3 is rarely optimal. Special values to indicate missing values are generally considered poor technique because they require additional implicit semantics and rules. However, when such special values are institutionalized, this choice may be justified.
The choice between Solutions 1 and 2 may depend on other considerations. For example, if an attribute may change its value, it may be better to place it in a separate table as in Solution 2 rather than to keep it together with the permanent attributes. For example, can a student declare her major or change her major later on? Under Solution 1, changing the major would require an update of the existing student record. In contrast, under Solution 2, the major can be changed without perturbing Student
.
In DataJoint, tuples are considered immutable and the proper way to manipulate data is through inserts and deletes rather than updates. Even though updates are implemented, they are more of a workaround for special cases.
Since Solution 2, has always been available, we are only now getting around to implement nullable foreign keys for Solution 1.
If the optional attribute is indeed permanent, then Solution 1 may be preferable because it saves us an extra table and potentially simplifies some queries. However, as always, dealing with NULL values may lead to some counterintuitive results in queries involving comparisons. Intuitive implementation of operators involving NULLs is what has kept me from promoting nullable foreign keys. We are working to define them more most logically.
For example, the results of Student - Major
under Solution 1, would always produce the empty set with the current implementation of restriction since MySQL’s IN
and NOT IN
operators always yield NULL when testing NULL values. We will fix the logic so that Student - Major
would yield students where major
is NULL.
Related posts
Entrepreneurs of Insight
A Better Data Engine for Brain Science
Data needs direction: five clarifications for database design
Updates Delivered *Straight to Your Inbox*
Join the mailing list for industry insights, company news, and product updates delivered monthly.
