Clustered Index - keypoints, advantages and disadvantages (Part 3)

Clustered index - keypoints

  • Clustered index is just a different approach of data storage. This is not a different type of index, and not all the engines support it.
  • Rows with adjacent key values are stored close to each other. 
  • It is possible to have only one cluster index per table, but you can use different keys and build a secondary index

    Advantages of clustered index:

  • Related data is stored close to eachother leading to less disk I/O  while retrieving sequential or range data
  • Faster data access as data and index are stored together at leaf node and leaf node are pointing to each other using pointers -Performance

Disadvantages of clustered index:

  • Data insertion speed is dependent on the order of Primary Key when there is a clustered index on the table
    • Table needs to be optimized if inserted data is not ordered by Primary Key
  • Clustered index have minimal impact for in memory data
  • Updating the clustered index column is expensive (resource wise) as data is moved based on its size to different location
  • Page split occurs when new data is inserted leading to fragmentation
  • Alot of fragmentation will lead to bad performance of entire server
  • Secondary index (non cluster index) does not store actual data, but it contains the location of the clustered index instead of raw pointer, hence the size is larger
    • if there is a clustered index, secondary index will contain the pointer to the clustered index
    • if there is no clustered index, secondary index will contain the row pointer to the actual data


In clustered innodb we only have data in leaf node.  Root and Intermediate Node will only contain a pointer to the next level node.

 

Let's say we want to retrieve Raw 22, mysql engine will traverse from Root node, to intermediate node and end up to Leaf Node.
If we want to retrieve values between raw 22 and 32, then it will use a forward pointer to retrieve the data from the neighbor leaf. This saves alot of I/O.

 

It will first ask the Root node where this value resides at, Root node will send to intermediate node 1-30 which contains location of raw 1-30 -> then to leaf node

Leaf node will contain entire data ordered by key column

Secondary Index for Innodb

 

So the question we have, why is there a need to use a secondary index?
The answer is that in practice it is not possible always to honor our cluster index as per all our queries. If the first query is for the first name and the second query is for the last name we clearly need different index.
So if we use the primary index for the first name we can use the secondary index for the last name.


  • clustered index
  • 0 Users Found This Useful
Was this answer helpful?

Related Articles

 Hash Index (Part 4)

Hash index Just like B-tree index, we also have a concept of a hash index. A hash index is...

 InnoDB versus MyISAM (Part 1)

Below we want to outline differences between InnoDB and MyIsam engine in mysql.   InnoDB...

 B-TREE and B+TREE Index (Part 2)

When you hear in the industry the word index, they often mean a B-tree index. Most of the MySQL...

 Indexing Strategies for High Performance (Part 6)

Indexes are called keys in MySQL. They are data structures that storage engines use to find rows...

 Creating Indexes for a table (Part 7)

Now, we will further continue with this demonstration and create indexes on the table film. Here...