At Coeo we make heavy use of Ola Hallengren’s Maintenance Solution for configuring database backup, integrity check and index optimisation jobs. We use it because it’s great, and we presented on it at SQLBits last year.
One of the strengths of the Ola jobs is their sheer customisability. With around 60 parameters for the backup procedure alone, you can fine-tune the behaviour of the job to suit just about any environment or requirement – scroll to the bottom of the (very handy and comprehensive) documentation to see 15 different ways to deploy the job. Only a couple of these parameters – @Databases and @BackupType – need to be specified when the procedure is run, as everything else has a default value set within the stored procedure definition. If you set it up wrong, or it fails for some other reason, there is also an exhaustive amount of error-handling contained within the procedures, and the Maintenance Solution is widely enough used that if you run into an error message that you don’t understand you can probably find people talking about it on Stack Exchange or somewhere similar.
Recently, I found that one of our customers’ backup jobs was failing with the following error:
“The parameter @AvailabilityGroupDirectoryStructure contains one or more tokens that are not supported.”
This would be a perfectly fine and helpful error message, if we had specified a value for the @AvailabilityGroupDirectoryStructure parameter, which we hadn’t, or if the SQL instance was part of any Availability Groups, which it wasn’t. I opened up the stored procedure to see what causes that particular error message to throw, and it was a clear and straightforward error of handling: it takes the given value for @AvailabilityGroupDirectoryStructure and applies a bunch of nested REPLACE() functions to get rid of all the valid tokens. If anything is left over, it throws an error as it means you’ve included something it hasn’t expected.
IF EXISTS (SELECT * FROM (SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@AvailabilityGroupDirectoryStructure,'{DirectorySeparator}',''),'{ServerName}',''),'{InstanceName}',''),'{ServiceName}',''),'{ClusterName}',''),'{AvailabilityGroupName}',''),'{DatabaseName}',''),'{BackupType}',''),'{Partial}',''),'{CopyOnly}',''),'{Description}',''),'{Year}',''),'{Month}',''),'{Day}',''),'{Week}',''),'{Hour}',''),'{Minute}',''),'{Second}',''),'{Millisecond}',''),'{Microsecond}',''),'{MajorVersion}',''),'{MinorVersion}','') AS AvailabilityGroupDirectoryStructure) Temp WHERE AvailabilityGroupDirectoryStructure LIKE '%%')
BEGIN
INSERT INTO @Errors ([Message], Severity, [State])
SELECT 'The parameter @AvailabilityGroupDirectoryStructure contains one or more tokens that are not supported.', 16, 1
END
But the parameter was set to the default, and when I ran this bit of code the whole string was cleaned up as expected. So why was it failing here?
After some head-scratching I eventually decided to compare the script as it was on the customer environment with a clean version from Ola’s website – there are plenty of places online where you can compare two text files and have the differences highlighted, change-control style. By doing this I found the issue.
Essentially, somebody had needed to change the @CleanupMode value, which determines the point during the backup process that the job should delete old backups. The two possible values for this parameter are “BEFORE_BACKUP” and “AFTER_BACKUP”, and each one does what it says on the tin. “AFTER_BACKUP” is the default as it is a bit safer – if the present backup fails you’ll still have your old one. However, “BEFORE_BACKUP” is a useful setting to apply in a situation where disk space is low; in the default behaviour, disk space will spike downwards in the short period of time between the new backup being made and the old one being deleted. “BEFORE_BACKUP” causes disk space to spike upwards due to the brief time when there is no backup at all (or one less backup than you have configured the job to keep).
So that’s @CleanupMode, which has nothing to do with @AvailabilityGroupDirectoryStructure (which defines what folders should be created to separate different backup files, specifically for any databases that are in an AG – the fact that this is even something configurable should give you an idea of the flexibility of complexity of the solution). If you wanted to specify a value for @CleanupMode, then the thing to do would be to define the parameter when the procedure is called within the job step, i.e.:
Alternatively, you could modify the stored procedure itself, and change the default value of the parameter to achieve the same result. If you’re the only DBA with eyes on the server and you know what you’re doing then it’s not the end of the world to do it this way, though I can’t think of any advantages myself, and anybody else familiar with the Maintenance Solution would get confused if they opened up the job step and saw less information than they expected.
What you would want to avoid, however, is modifying the stored procedure this way using a text editor that, unbeknownst to you, also introduces line breaks past a certain length, breaking the procedure in many places. You can be quite liberal with line breaks in SQL when it comes to the overall syntax, but if you cut a word in half then things aren’t going to behave. This is what had happened, and our @AvailabilityGroupDirectoryStructure error handling block happened to include the first long line that was broken up in such a way that it no longer worked. The job failed, and we received a confusing error message. With the mystery solved I was able to remove all the erroneous line breaks, declare the @CleanupMode value in the right place and re-run the job successfully.
This was easy to troubleshoot because of how well-designed the Maintenance Solution is, but highlights some of the best-practice points for working with it. Firstly, if you’re going to specify a non-default parameter value (and you probably are), then do it when the procedure is called rather than modifying the procedure itself. In addition to the above issue, Ola Hallengren will often update his procedures to improve them or keep up with attendant developments in SQL Server (Azure compatibility, for example), so if you’ve fiddled with the procedure and then install a new version, those changes will be overwritten and your backups won’t behave as you expect. It is worth scripting out the procedure, however, to have a look at the nuts and bolts of it and understand how it works, as this will aid troubleshooting in case of any issues.
Secondly, this highlights the value of playing about with the procedures on your own test environment before deploying in production, to ensure you know what will happen when you schedule the job to run. My favourite parameter is @Execute which, when set to N, means the procedure does nothing at all, but instead tells you what it would do if it were to run. This is very helpful for testing out things like filenames and directory structures to ensure the final result is what you expect, and it will even raise error messages if it can tell you’ve told it to do something it can’t (it can’t catch everything this way, though; the job doesn’t monitor disk space, for example, so it won’t know it’s about to fill your drive until it does).
Finally, use the documentation whenever you’re making a change that you’re less than 95% confident on. It’s there, and it’s good, so check it whenever you’re configuring any unfamiliar parameters.
Let us know in the comments if you’ve ever run into any interesting issues with Ola’s Maintenance Solution, and how you think it compares to the native maintenance plans.