+44 (0)20 3051 3595 | info@coeo.com | Client portal login

Killing Your SQL Server with Kindness

Joe Pollock

When you discover a poor, uncared for SQL Server instance that has been neglected for years, your immediate response is to get maintenance going and give it some love. Get it back to its former glory and running optimally.

But you need to approach this with caution.

There are risks associated with rushing into this and you need to tread carefully. This may well be a server that has been sat quietly in the corner for a long time, steadily chugging along with never a problem. Irrespective that it has never been backed up, checked or optimised, from the eyes of the users it is just fine. The last thing you want as a DBA is to come flying in to make it “better”, only to cause it to run out of disk space, be slower, have blocking or come to a grinding halt.

Just like Maslow’s hierarchy, databases have needs too, and there is an order in which to approach this task starting with protection at the bottom of the pyramid, moving through health and ending with performance improvements. And at each tier, there are risks and considerations when dealing with a previously unmaintained instance.

Backups

Taking backups is the first step on the road towards a happier server, but even here there are dangers. The first consideration is storage. You would ideally be storing the backups on a drive external to the server to prevent losing everything, but if you are having to use a local drive that is shared with database files, then you can easily fill the disk (depending on database size and retention) and prevent data or transaction log files from growing.

A less often considered side-effect is that of full backups and the transaction log. Especially if you have large, highly transactional databases, then you have to remember that transaction logs cannot clear down whilst a full backup is running, even if you back up the log or are in simple recovery mode. This is necessary so that the full backup can maintain transactional consistency when it’s restored. So your transaction logs need to be large enough to cope with however long the full backups take, or they will grow, potentially fill the disk, and stop the database from being able to make changes.

Consistency Checks

Of all the maintenance we are considering here, running DBCC CHECKDB is the least worrying. It will use some memory, but it uses the method of buffer pool disfavoring to reuse the memory between database checks, and won’t steal all the memory. And by default, it uses an internal snapshot which (due to Windows limitations) can fail with sufficient I/O activity, but it will only fail the check and not break the server.

The main concern is how the load can affect performance. CHECKDB is designed to run as quickly as possible, so uses as many resources as it can to achieve this. Running this when you have high user activity can slow down the server, so you need to be careful with your scheduling, and also look at the Microsoft recommendations to use physical-only checks as the standard execution method to reduce the work involved.

Statistics

If you are running sampled updates of index statistics, then this would not normally impact performance greatly, as you are not hitting much data to compile the data you need. A higher sample rate or full scan has a greater impact, as you can end up reading every leaf page in every index, which on a large database can take hours.

You need to be aware of how updating statistics can affect query performance. Execution plans are built based on the index statistics, so as the statistics are updated, plans can change too. This should be for the better, but not always. You also need to check if auto-update stats is turned on for the database. If it is, then the statistics should be mostly okay, and a maintenance job will have less impact. If not, then the server will have been running with old statistics compared to the data, and updating the statistics could cause a lot of execution plans to change.

Testing is key if you are able to, and make use of Query Store if you have it on your SQL Server version, as that allows you to see where execution plans regress and you can revert to the old one whilst you try to fix the issue.

Indexes

This is the top tier of the pyramid, and the largest cause of issues when introducing it to an unmaintained server. With index maintenance, we are talking about removing fragmentation, which is both internal (page density) and external (logical order/physical). And this is achieved by rebuilding or reorganising the indexes.

Lots to go wrong here. Firstly, timings. Unlike backups and consistency checks which take about the same time to execute every day, index maintenance tends to run based on the current levels of fragmentation which will vary from day-to-day. This can lead to the maintenance suddenly overrunning into a period of peak activity, causing havoc with slow performance. This needs monitoring, but it's always a good plan to set up a “kill” job, which will terminate the maintenance at a fixed time if it is still running, so that you sacrifice the index work completing in favour of protecting the server performance.

Then you need to look at how you are doing the maintenance. It is very common to use the 5/30 rule on this. Which is:

  • Under 5% fragmentation, do nothing
  • Between 5% and 30%, reorganise
  • 30% and over, rebuild.

On a server where maintenance has never been done, you will have a lot of indexes with high fragmentation, so the first time you do this you will get a lot of rebuilds. Safer to start with higher percentages on the boundaries to limit this, and as the index maintenance starts to complete within the maintenance window you can adjust them down.

Then you need to check your SQL Server version. If you are running Standard edition you don’t have access to online index rebuilds, so if you are using the above (very common) 5/30 method, then you will suddenly start offlining indexes to rebuild them, which at best can affect performance, and at worst make a table inaccessible and cause blocking. If you don’t have a proper maintenance window when using Standard edition, then you are safer using reorganise, as this is always online. And it has the added benefit that if it gets stopped by the kill job all work up to that point is preserved, whereas for a rebuild all the work is lost.

To rebuild indexes you need free space in the database, and the larger the index the more space you will need. So expect your databases to grow the first time you run maintenance, and if you are already tight on space on your data drives, you have the potential to fill them up.

The last point is the transaction log. Index maintenance generates a log of log records, which impacts the size the transaction log needs to be and also makes the log backups larger. And if you are using AlwaysOn Availability Groups, this can cause latency between the replicas.

Remember that index fragmentation is rarely the root cause of slow performance issues. I’m not saying don’t maintain indexes, but try not to break everything as you do it.

Conclusion

Once maintenance is set up and running, it is mainly smooth sailing, with not too much to worry about. Just be careful as you start to maintain neglected servers, and don’t try to do it all in one go. Start with backups and consistency checks, as these give you the confidence that the databases are not corrupted, and that you have recovery options. Then move onto statistics to give the optimiser the best chance at producing the right execution plans, and after that index maintenance. This last one introduces the most risk for the least benefit, and where we always see the most issues. Usually in the middle of the night.

New call-to-action

Subscribe to Email Updates

Back to top