MySQL Indexing –
InnoDB Indexes
A few
days back I was reading about MySQL indexes, more specifically InnoDB
indexes, to better understand query performance and optimization. So
I thought of sharing some information on this topic. Indexes are
basically structures that help the database engine in finding
(retreiving) the records faster. An opposite to index lookup is full
scan. Think of a full table scan as going through all the rows in a
table and selecting the right row.
A
common example that is generally given when explaining indexes is a
book's index. To look up for a topic in any book, you either look up
for the topic in an index or scan the whole book page by page.
Obviously if the book has less pages, it is viable to go page by page
and scan for a topic but if the book has decent number of pages than
using an index is a smart and efficient approach. Same is the case
with database indexes.
InnoDB
Indexes
MySQL InnoDB Index Structure
uses B+ Tree structure to store its data. B+ Tree structure is a
different topic. I will be writing a blog on how data in B+ Trees is
organized and how insert, update and delete effects the tree
Clustered Index
Clustered Index is an
approach to store data. Think of a Clustered Index as a tree
structure (index), with data rows as leaves and primary key as nodes
above the leaves. InnoDB clusters the data by primary key. Below are
some points to remember regarding the Clustered Index:
- As stated earlier, InnoDB clusters the data by primary key. If the table has a primary key, MySQL will use this primary key for Clustered Index.
- If the table does not have a primary key, MySQL selects the first UNIQUE and NOT NULL index for Clustered Index.
- If none of the above applies, MySQL will generate a hidden (6 byte) field that contain row IDs. MySQL will use this hidden field to cluster data (as Clustered Index).
As Clustered Index holds
both the data and primary key on the same page, row access is faster
because no additional disk I/O is needed. On the other hand, in case
of MyISAM, an additional disk I/O is needed as index and data are not
on the same page. Below are some points worth mentioning:
- Insertion speeds depend on how data is inserted into the table. Insertions are fast if data is inserted in primary key order. A bad approach is to insert data randomly (with random keys) but a good approach is to have sequential keys (like AUTO_INCREMENT)
- Updating primary key may not be a good idea as it forces each updated row to be moved to a different location. Moving rows to a new location may lead to page splits which causes a table to use more disk space.
- Though Clustered Indexes are efficient in terms of retreival, defining a clustered key having many columns may be a disadvantage. It would be clear why its a disadvantage once you read about the secondary index.
Secondary Indexes
Secondary Indexes are also
called non-clustered index. Unlike clustered index, they do not store
row data as leaves but they store primary key (clustered index) as
leaves. It is due to this reason that it is advised to keep primary
key short. The size of primary key will effect the size of a
secondary index. As far as lookups are concerned, secondary index
look up requires two steps. One to get primary keys matching the
secondary key lookup and after that fetching the actual data by
looking up the primary keys that are fetched before, from the
clustered index.