Strong Entity Integrity: Part 5 — Aggregation
DataJoint’s principle of Strong Entity Integrity (SEI) requires that queries must not introduce new classes of entities: projection and restriction operators preserve the entity class of its operand whereas the join operator yields a new entity class but one comprising the combination of the entity classes of its operands. A DataJoint query make obvious the entity class of its result.
One operator that most egregiously violates this principle in relational algebra and SQL is the aggregation operator also known as GROUP BY. In its traditional formulation, aggregation allows performing summary operations on groups of tuples identified by a combination of grouping attributes. The effective primary key of the result are the grouping attributes. Therefore the query creates a new type of entity in the middle of a query that is not traced to any explicitly defined entity class. SQL allows a further deviation from entity integrity with its ROLLUP functionality producing results comprising mixtures of entity classes with different primary keys.
DataJoint respects SEI and defines aggregation as a binary operator allowing summary operations on subsets of entity set B grouped by entities from class A. It has the following notation: A.aggr(B, ...') where ... is a list of aggregation computations such as n='sum(attr)`. In this way aggregation is a form of projection, preserving the entity class of A.
A.aggr(B, ...) is equivalent to SELECT ... FROM B GROUP BY . As always, the primary key cannot be omitted.
DataJoint simply makes explicit what programmers imply implicitly when they perform a GROUP BY operation. When grouping, we conceive of some sort of entity represented by the grouping attributes. In most cases, that entity class is already defined, often in the form of a table. Yet SQL forces users to spell out the list of attributes rather than explicitly specify the aggregating entity set.
For example, consider the following table definition for college course grades:
@schema
class
Grade(dj.Manual):
definition
=
"""
-> Student
-> Course
---
grade: decimal(3, 2) # grade e.g. 3.67
"""
Then studentss GPA can be computed as
Student().aggr(Grade(), gpa
=
'AVG(grade)')
The entity class of the result is the same as that of the aggregating entity: the result is still a set entities of class Student.
Related posts
Optional dependencies
Aggregation functions within restriction conditions
Strong Entity Integrity: Part 7 — Division
Updates Delivered *Straight to Your Inbox*
Join the mailing list for industry insights, company news, and product updates delivered monthly.
