Saturday, July 20, 2024

Primary Key

A Primary Key (commonly referred to as PK) is a specific type of Unique MySQL Index that must not contain any NULL values. A database table is permitted to have only one Primary Key and for certain Storage Engines the Primary Key is optional. As a Best Practice it is recommended a table always have a Primary Key in order to ensure a unique means of retrieving a single row of data.

With MySQL there are additional rules and constructs with Primary Keys. If an integer column is specified with the AUTO_INCREMENT attribute, this column must be contained within the Primary Key. While an AUTO_INCREMENT value is actually unique, a Primary Key may contain additional columns. The InnoDB Storage Engine requires a table to have a Primary Key. If one is not specified InnoDB will use a Unique Key if defined. If there is no Unique Key Index InnoDB will then internally create a 5 byte Primary Key.

A Primary Key Index is generally a BTREE Index however this also can vary based on Storage Engine. The MEMORY storage engine uses a HASH Index type by default. The InnoDB Storage Engine creates a clustered BTREE Primary Key Index enabling a better sequential read retrieval of rows when ordered by the Primary Key.

There is no one Best Practice for defining a Primary Key. It is a common practice to create a surrogate key using an AUTO_INCREMENT integer values, and this can then be a standard to be applied to all tables. Using a natural unique key can be problematic for intersection or child tables.

Using a good naming standard for Primary Keys is important. The definition of the same column name for all primary keys is not a good practice, for example ‘id’ or ‘pk’. It is better to correctly identify your primary key column, e.g. ‘product_id’, ‘invoice_id’ as this helps with clear identification in SELECT statements with joins, and also with foreign key references. A small syntax benefit is the use of the USING syntax for INNER JOIN definitions.

While it may be considered that a UUID column is a good candidate for a Primary Key, the use of this with InnoDB can be problematic due to the page fill factor of a non sequential key, and the size can affect the size of Secondary Indexes.