Tuesday, 10 January 2017

Surrogate Keys

In this post, we see the advantages of using a surrogate key in a datawarehouse. But, first, a brief look at natural keys in OLTP applications. Natural keys are created by using data that constitute that record and uniquely identify records in a relational database. A surrogate key, by contrast, is just an integer and nothing can be inferred from it. The reasons why surrogate keys are preferred in a datawarehouse are:

1) Surrogate keys occupy less space than natural keys

Natural keys are usually alphanumeric strings like '978-3-16-148410-0' (a 13 digit ISBN number that uniquely identifies a book) and occupy more space compared to a surrogate integer key like 1010102 (a random integer). This can be seen from the below query:

select vsize('978-3-16-148410-0') "Natural Key", vsize(1010102) "Surrogate Key" from dual;

returns result shown below:



VSIZE function in Oracle returns the number of bytes in the internal representation. We can can see that the number of bytes is more in case of natural key

A direct result of this is that related indexes also will occupy less space in case of surrogate keys resulting in better performance

2) Surrogate keys help in maintaining historical context

Datawarehouses maintain history for records in OLTP applications. So, in case a record in OLTP application is modified or deleted, then, while the natural key of that record may be lost in OLTP applications, the corresponding record is preserved in a datawarehouse for analysis in future using slowly changing dimensions. While a natural key may be updated because of any new business requirement within an organization or because of any acquisition or merger of that organization with another organization, surrogate keys on the other hand are buffered from these changes