The Dirty (Baker’s) Dozen of SQL Server Technical Debt

The Baker’s dozen originated from bakers adding the thirteenth loaf to avoid a beating for underselling bread. There will (hopefully?) be no such penalties for SQL Server indiscretions, but try to avoid adding technical debt to undersell your future happiness.

What do we mean by technical debt?

Taking the easy option today, that will cost you down the line. Examples of debt are:

You have limited upgrade options.
Brittle code nobody wants to touch.
Change is cumbersome, requiring you to alter many modules.
Releases are done on Sundays impacting your free time.
You dislike your job, never achieving the elusive state of flow where your work is a joy.
An external audit or code review terrifies you.
Onboarding new staff to work on the database is a massive time sink.
The application interacting with the database will imminently stop working.

210520_AJ_TechDebt

Photo by Alexander Schimmeck on Unsplash

Here are thirteen examples to avoid:

1) Cross-Database Queries

Don’t tightly couple different databases on the same SQL Server by referencing one from the other, (e.g. SELECT CustomerKey FROM Sales.dim.Customer;) unless necessary.

You’ve heard the one about disaster only strikes in the middle of the night? Waking up to fix that is stressful enough, without adding extra complexity into the mix. You restore one database, then realise you don’t have log backups to bring the other database to the same point in time. You get the picture.

Azure SQL Database does not support cross-database queries. Do you want to migrate to Azure SQL Database in the future? Maybe, maybe not, but it’s great to keep your options open and have that choice without a load of rework.

Lastly, test automation is far more challenging with cross-database queries. A key tenant of unit testing is mocking tables and self-generating test data. The tSQLt unit testing framework does not support mocking tables outside the current database.

2) Case

Get the case of every SQL Server object and column precisely correct when you reference them in your code. Sure, you’re using a case-insensitive collation today, and your code works. What if you deploy your project to a database with a case-sensitive collation tomorrow and you have to rework? Try this in a database with a case-sensitive collation to demonstrate:

SELECT create_date
FROM sys.Tables;

Msg 208, Level 16, State 1, Line 1

Invalid object name 'sys.Tables'.

Besides the code looking lazy and less readable, your pipeline will throw loads of warnings during the build process. You will then have to filter the noise, making finding genuine errors more difficult.

You might say the above doesn’t apply to you, and the case doesn’t matter. You can use the wrong case or get it right for zero extra effort. There is no excuse for using the incorrect case.

3) Sysadmin

Have you ever encountered a problem, you know deep in your gut is security related? What permissions should I apply, where is the issue, how did I end up here? You add the account to the sysadmin role and guess what, everything works fine. You then make a little deal with yourself, “I’m so busy right now, I’ll leave the sysadmin permissions in place and come back and fix that later”. Stop yourself there and remember the adage:

“There is nothing more permanent than a temporary solution”.

The time to fix never comes. The security audit only happens after you have left the company, leaving an old friend to pay down your technical debt. Always apply the principle of least privilege and don’t plaster over good practice with short term fixes.

4) Surrogate

Do you need a surrogate key? No problem:

CREATE TABLE dbo.Sales
 
(
 
SaleID INT IDENTITY NOT NULL CONSTRAINT PK_Sales PRIMARY KEY
 
);

All is well, and you move on. Well, all is well, until seven years down the line, your business is successful, and you have 1,800,000,000 sales lines (not a bad problem to have in itself). The slow and dawning realisation hits you that your database will be down and not accept new sales when that number hits 2,147,483,647. The next realisation hits you that changing SaleID to a BIGINT requires massive rework in downstream applications.

Take time when creating tables to select the correct data type (INT, BIGINT, UNIQUEIDENTIFIER?). Predict the data volume with a large dose of contingency.
Surrogate keys should be meaningless and arbitrary, never assume chronology with an increase in value. One option when running out of identity values to keep the lights on is to reseed and use the negative range. That avenue is closed if calling code assumes chronology.
If using INT, don’t accept the default seed of one without sufficient thought. Why not specify “SaleID INT IDENTITY(-2147483648, 1) NOT NULL” to give you double the range, and potentially give you a solution to see out the lifetime of the application.

5) Star

Avoid “SELECT *” in production code. What happens down the line when a release adds columns to the table? The worst-case scenario is calling code breaks with “Column name or number of supplied values does not match table definition.” Causing more hassle and rework for you. The best-case scenario is your reports don’t error but are now bringing unnecessary bytes over the wire, negatively impacting performance.

6) Ordinal

Go to the trouble of writing explicit column names in your order clauses and don’t take shortcuts. For example:

SELECT CountryName,
       Continent
FROM Application.Countries
ORDER BY 2,
         1;

You want to order by Continent then CountryName, so write those columns in your order by clause. With ordinal ordering, when the next developer changes the select clause to add a column, you are now relying on them to read down the code and realise they must change the order by too. Another risk is

SELECT *
FROM Application.Countries
ORDER BY 3,
         7;

That code block will change behaviour if someone adds a column to an earlier ordinal position. Again more bugs, rework and lost productivity (and another reason to avoid select * mentioned elsewhere).

7) Upgrades

Tomorrow, your version of SQL Server will be one day closer to exiting Microsoft support. Running an unsupported data platform is not a place anyone wants to be, so create an upgrade plan. Upgrading from a position of weakness (i.e. we were too busy before, but our version of SQL Server leaves extended support next month, and we’ll fail our PCI audit) is a horribly stressful situation.

I realise there can be complexities with upgrading due to vendor support, etc. but the longer you leave it, the more debt you will pay. Attempting to upgrade a jump of four, five, or six versions of SQL Server are complex projects with more incompatibilities to overcome.

Always consider cloud options here too. If you have ruled this out before, revisit and question if your reasons are still valid. Using Platform as a Service evergreen offerings, you will never patch or upgrade again are the very definition of removing technical debt.

8) Set-Based

Assume sets when working with SQL code. Just because your code updates one row today, it may update zero, one, or many rows in the future. Create logic that handles all scenarios to avoid unexpected behaviour down the line.

A classic example is a trigger that assumes inserts into its parent table are one row. That seems reasonable as the only insert method is a stored procedure that accepts parameters and inserts a single row. That might not be the case forever, as requirements and businesses change. Your colleague adds a new routine the following year to mass insert customers following a company acquisition. Your friend is either faced with a load of trigger rework or doesn’t realise the trigger implication and causes a production outage breaking audit logic.

Like elsewhere in this post, thinking in sets is another scenario where you future proof your solution for zero extra effort, avoiding pain down the line.

9) Manual

Manual build, test and deployment are error-prone, time-consuming and stressful. Introducing a DevOps culture will revolutionise your ability to add value faster and more reliably. Waterfall releases that break things left and right are technical debt, and there is a better way.

Start with an audit of the flow of a work item through your processes. This exercise will be enlightening and show many hidden time sinks. How does your team gather requirements, manage change, build projects, test, release, document, monitor and support database applications? Are you releasing many features once every quarter, emailing a test team to request a manual test cycle, continually firefighting production issues?
Where is the bottleneck? What can you automate and improve? Maybe add your database to source code control, create an automated build pipeline, create a single unit test. Start small, but start somewhere on your journey to an improved process.

A full DevOps discussion is out of scope here, and there are many resources online to help you. Start small and don’t panic that you have to do it all today.

10) Jobs

You have a robust set of SQL Agent jobs for maintenance and ELT (extract, load, and transform).

There are no problems here, and you move on. But what happens when your Always On availability group fails over to a secondary replica? Do the jobs even exist on the server housing the secondary (now primary) replica? Do the jobs still run on the old primary, fail and start sending alerts requiring manual intervention?

Handling jobs gracefully is another of those occasions where its easy to say you’ll fix it down the line when you have more time. It’s far better to implement a solution while you’re familiar with the code and working on setting the job up. Add an initial step to each job to check if the host houses the primary writeable replica before proceeding. You will create a robust solution and avoid nasty surprises down the line.

11) Constraints

Shifting left is the DevOps principle of finding errors as early as possible in the development pipeline when they are cheapest to fix. Traditionally, you find errors early by a robust unit testing strategy covering your codebase. There is a long-available oft-neglected technique also available to you – creating constraints. Take the time when creating tables to decide what constraints are appropriate, e.g. primary key, unique key, default and check constraints. Consider the example of a customer dimension table in your data warehouse that tracks change over time with Valid From, Valid To, and Is Current columns. Here are some ideas:

What is the primary key?
What is the natural key? Can you create a unique constraint?
Can Valid From be after Valid To?
Can a customer ever have zero or more than one Current row?
Can a customer be born in the future? Can a customer be born in 1750?
Can a postcode be seven consecutive numbers?

These are examples, and the list goes on. Take 30 minutes when creating your table to get that time back many, many times over in the future.

Creating constraints avoids the adage of rubbish in – rubbish out. Those constraints will sit there forever, allowing developers to find mistakes in their data manipulation statements immediately, preventing costly mistakes, making it through to production.

12) Column List

Specify the column list when inserting data. Look at this example:

DROP TABLE IF EXISTS dbo.Store;
 
CREATE TABLE dbo.Store
(
    StoreKey INT NOT NULL
        CONSTRAINT PK_Store PRIMARY KEY,
    StoreName NVARCHAR(20) NOT NULL
        CONSTRAINT UK_Store_StoreName
        UNIQUE,
    StoreRegion NVARCHAR(20) NOT NULL
);
 
INSERT INTO dbo.Store
VALUES
(57, N’London’, N’South East’);

(1 row affected)

I didn’t specify the column list when inserting. However, I saved some typing, and the insert worked, so what’s the problem?

What if we add a new column?

ALTER TABLE dbo.Store ADD Floorspace INT NULL;
 
INSERT INTO dbo.Store
VALUES
(94, N'Birmingham', N'Midlands');

Column name or number of supplied values does not match table definition.

Now our application inserts start failing. A DevOps culture allows you to make database changes like this continually during the business day, maybe way in advance of you releasing the feature to the front-end code. Specifying the column list (i.e. INSERT INTO dbo.Store (StoreKey, StoreName, StoreRegion)) allows that to happen, making your codebase far more robust.

13) Comments

Add comments to your code. Consider the database application from the perspective of a new pair of eyes and a future colleague picking up your database code. How easy will it be for them to understand the reasoning and functionality? Try and help them out, leaving breadcrumbs of information to give them a fighting chance. A well-commented, modular codebase will make a new team member productive far quicker, reducing the time burden on you and others.

Commenting your commits and pull requests should be in addition to and not instead of commenting directly within your code. Don’t make others cross-reference an external system when bug-fixing a stored procedure, what if they don’t have access to that external system? Inline comments will significantly help others.

What about taking things to the next level and adding extended properties to describe all your SQL Server modules? That way, when you get the requirement down the line to document your database for regulatory reasons, you use a third-party documentation product to create the output. Third-party tools will include your extended property description of each module, table and column for zero extra effort, and you won’t find a better example of removing technical debt than that.

Thirteen, Over and Out

There are thirteen examples of technical debt you should consider removing. What do you think? Agree? Disagree? What items did I miss? Honourable mentions go to omitting the schema name, inconsistent naming convention, ETL not ELT, no normalisation, and the lack of abstraction layer. But I had to stop somewhere, and I had already written that Baker’s dozen stuff at the start.

You don’t have to rip out all these items from your environment immediately. Similar to the strangler pattern to migrate a legacy application, introduce change gradually when working on your database applications. A static code analysis tool such as Redgate SQL Prompt (https://www.red-gate.com/products/sql-development/sql-prompt/) can also greatly help you remove lots of these bad practices.

You may feel lots of these items are inconsequential and don’t matter a great deal or don’t apply to you. The Team Sky cycling team introduced the concept of marginal gains, where increasing the standards of many items by a tiny fraction has massive performance gains across an organisation. Removing technical debt allows you the freedom to update your application when you want and how you want with confidence things won’t break or require extensive rework. You will ultimately enjoy your job far more, reaching that elusive state of flow and joy.

The Dirty (Baker’s) Dozen of SQL Server Technical Debt

The Coeo Blog