Total index maintenance

There is no magic in indexes. Instead, it is all about timing and balance.

A few quick word on index structures – nothing deep here, I am sure if you are reading this, you already know enough about index structures:

  • indexes are organized in B-trees, i.e. leaf structures comprised of hierarchical pages, pointing to each other
  • there are several kinds of index pages: root, intermediary (n-number of levels, depending on the index size) and leaf level pages
  • the clustered indexes contain the actual physical data of the table (or partition) in the leaf level pages and the non-clustered indexes contain pointers in the leaf level pages
  • hence, non-clustered indexes are additional structures which can help with query performance but are not critical for the table’s data, and can be easily re-created if damaged (unlike the clustered indexes)
  • each index has statistics that stand behind it

 

Here we get to the maintenance topic of the indexes: Why and when is maintenance of indexes important?

Answer: Index maintenance is essential for the good performance of the query processing. Note, that index maintenance is a much broader topic than just index rebuild and reorganize. Index maintenance includes:

  • index defragmentation (index rebuild and reorganize)
  • index analysis and usage trending
  • index options adjustments
  • index strategics (index creation, removal, checking for duplicates, unification, segregation etc.)

In the contemporary database world index fragmentation can be caused by several factors: database shrinking, data manipulation (inserts, deletes and updates), external factors. (Here I am talking about the spindle drives, where the distance between point A and point B can vary; the SSDs are a whole different story.)

Index analysis and usage trending is important part of the index maintenance, since the index access patterns play significant role in the choice of indexing strategies. Since the indexes are composed of 8k pages (and each page has header, footer and records on it), this means that there is ‘only that much‘ that can fit on a page. The access methods of an index page can be a single record lookup or a block (range) of records. Hence, this comes to prove that without analysis and trending the successful index maintenance is impossible. (Since we must know how and when an index is used before making decisions about it.)

Index options adjustments are also very critical for the query processing performance. The Index analysis and usage trending will dictate the choice of index options (properties) in order to ensure the most efficient query performance. There are various index properties, here are some:

  • fill factor
  • index padding
  • automatically recompute statistics
  • parallelism
  • etc.
Index maintenance UI

Index maintenance UI

Side note: there is a silly mixup in the index maintenance area: both in the SSMS user interface and in the T-SQL statement used to create and maintain indexes we have ‘apples and oranges in the same basket’ – i.e. we have persistent property settings (call them flags, if you want) and we have also options (settings) for use when rebuilding the indexes.It is true, that in order to change the fill factor, you have to rebuild the index, however, at the same time you will run into performance problems if you have an index which uses the ‘automatically recompute statistics’ setting set to OFF and you rebuild the index without noting this fact. This is a different topic, I don’t mean to go on a rant anyway. It’s just that this reminds me of the good old 2-stroke engine where you have to pour the gas and the engine oil in the same tank – a bit messy after all.

 

And here we come to the essence: index strategics. There is a plethora of factors and considerations when designing an index. Some of the many considerations may be (in no particular order):

  • the ratio between reads and inserts/updates / deletes
  • access patterns – single record lookup or a range
  • how often do we query a table by using particular index (access patterns per minute, hour, day, month)
  • type of query load – OLAP, OLTP, mixed
  • what other indexes there are
  • how fast is our storage
  • how much memory we have for the instance (and do the indexes fit in memory and how often they are flushed out)
  • how much disk volume can we afford for indexes
  • backup strategy
  • etc.

 

Bottom line:

Here is the golden rule for index maintenance:

There is no rule; it all depends! Think smartly and find radical long-lasting solutions.

As you see, the index maintenance goes way beyond rebuild and reorganize. There are many prerequisites before even a proper decision can be made about the index rebuild schema (what, where, when and how).

(And by the way, since I mentioned the SSDs, let me just say: No, the SSDs do not really solve utterly every single problem in the world. I wish they did. :) )

 

Bonus material:

Yesterday I read a newsletter by Ben Taylor (Index Maintenance) in which he shared some personal experience with index maintenance. I asked Ben if he wouldn’t mind if I quote his text and if I could make a few comments on it, since it will be a very good hands-on exercise. Here is part of Ben’s text:

… Here is a real world example from my experience. I have a client where I literally have to re-organize certain tables every hour during business hours in order keep optimum performance, because the entire contents of these tables change two to three times daily. There are less than a million rows in each table. I rebuild the other indexes nightly if they are more than 10% fragmented…otherwise, I simply update the statistics. This database is a hybrid for both OLTP and OLAP. The hourly re-indexing supports the OLAP functions with a few rapid changing dimensions…

 

I would like to start by saying once again that I am not 100% familiar with the system Ben is working with, and the following thoughts are just ponderings on the subject, rather than complete analysis. Furthermore, even though I do not know Ben personally, I have deep respect for him and his experience as a DBA.

So, here it goes:

My mantra (call it ‘moto’, or even ‘gut feeling’) as a DBA is as follows:

If my SQL Server is working too hard – then I am definitely doing something wrong and must correct it at first opportunity.

 

Having said that, I must share that there were a few things that alarmed me when I read Ben’s text.

  1. ‘re-organizing indexes every hour because the data changes 2 or 3 times daily’ seems a bit extreme to me. The index rebuild and reorganize is expensive operation from the point of view of IOs, CPU and memory. An expensive operation during business hours can make quite a few users unhappy.
  2. ‘There are less than million rows in each table’ seems like a very insignificant amount of data to me… well, it does depend on the data types, the database design and the index design, of course! :) But generally, I deal daily with tables which have from half billion to 1 billion rows and which grow at the rate of 20 million per day and I ‘squeeze’ great performance out of them.
  3. ‘this database is a hybrid for both OLTP and OLAP’ – yes, I was afraid of that.

SQL Server offers very flexible possibilities for performance tuning, and some of them are hidden into the choice of edition of the product, others are just based on smart database and application design.

When we talk about SQL Server editions, there are a few ways to look at it: we can either see the Enterprise edition as an overpriced perfectly well performing heavy machinery, or we can look at the Standard edition as a cheap semi-flexible barely scalable version of the Enterprise edition.

At any rate, it is proven fact that Enterprise edition is the best choice when it comes to data availability, reliability and scalability. In Ben’s example I am sure that Enterprise edition of SQL Server can address and solve the performance issues by using some or combination of the following (and replace the hourly index rebuild):

  • database snapshots
  • partitioning and file groups
  • online operations
  • etc

Also, the Standard edition offers a few ‘strong sides’ as well (some shy replacements for the lack of Enterprise features):

  • query hints
  • filegroups placement on proper disks
  • SSDs used for particular tables / indexes
  • etc

It’s just that rebuilding / reorganizing indexes every hour seems a bit too much of a hard work for SQL Server, and my instinct tells me that there must be a better solution somehow. I would be very curious, though, to know if this is a particular case where this is the best solution. Ben, please feel free to comment on this or send me an email, if you have the time.

Once again, this blog post is merely a good exercise on index maintenance and the radical approach of solving performance problems.

One more note:

Think, think, think before designing. :)

Good luck.

 

2 comments to Total index maintenance

  • Nicely done.

    I wish I could provide more information regarding the unique situation I resolved with index re-organization hourly. Confidentiality agreements constrain my response.

    I will give some background here regarding constraints I faced.

    Money is a big issue, the Enterprise Edition of SQL Server is not an option.

    The most recent value of an a pricing index was required for quotes produced real time. The pricing values change on an average of 3-4 times within a business day, sometimes more often. 100% of the contents change 3-4 times daily. So, even with Enterprise edition, snapshots provide no benefit. Background index rebuilding would have been useful…but was still a more acceptable hack.

    Partitioning the old fashioned way using a federated view would helped, and was even tried, having a separate file for each partition. 10 tables had 4 daily partitions resulting in 40 different file objects. That way each table would not interleave with other tables, resulting in no fragmentation since rows were would be deleted from one partition and inserted into another partition (usually depending on how often the update occurred).

    This option was tested, and due to the federated view, inadequate performance was acheived.

    At the end of the day, the solution was that the process did not belong in the database in the first place. Running an index Reorganize each hour kept the indexes performing at the optimum speed for the time being while a more extensible solution is being created.

    The fact that the database is used for both OLAP and OLTP has no bearing on this particular issue. These tables, and the reports based on them, are strictly OLTP only. Sorry about that false trail.

    I hope this limited information provides some insight into the problem being solved. Thanks Feodor for including my experience in your posting.

    While there is no Magic in indexes, there is definitely an art. There are many ways to create them, implement them and maintain them. The correct options at the correct time can not be written in anything less than a book. I rarely see more than a chapter.

    Keep writing…this is valuable stuff.

    Cheers,

    Ben

    Ben Taylor – Editor, SQL Server Worldwide Users Group
    http://www.SSWUG.org
    btaylor@sswug.org

    • Ben, thank you for the comment: this seems to be a very interesting case; too bad that a 100 years will have to pass before the topic’s confidentiality becomes irrelevant and before you can share the details. :)

      You are right that the Enterprise edition does not always solve all problems. Unfortunately.

      I am just curious in this case if you have considered SSDs and large amounts of cache as solutions?