The Coeo Blog

When Two Billion is Not Enough

Written by Andy Jones | 03-May-2016 14:33:42

Introduction

The SQL Server integer data type takes four bytes of storage and can store a maximum value of 2,147,483,647 (2^31 - 1). This is commonly used as the data type for columns with an identity property. Unfortunately, what is also common, is the maximum column value ticks up until you start to get that nervous feeling where 2,147,483,647 doesn’t seem quite so large any more. If that is the scenario you are faced with, action needs to be taken immediately to avoid all inserts failing and the dreaded red text of

 

Action Plan

  1. Estimate the time period you have until the maximum integer value will be exceeded. Don’t assume linear growth, make the presumption that whatever entity the identity column is uniquely identifying is growing exponentially. Far better to be cautious and estimate a date that is too soon. Base your estimate on the elapsed time to reach your current maximum value and perform a simple calculation but factor in significant contingency.
  2. Start logging the maximum value in the identity column on a daily basis. You can then revise your estimate over time to provide far more accuracy.
  3. Decide collaboratively within your organisation the steps you are going to take to resolve this issue.

Remedy

Two methods I have used to deliver this type of project have been

  1. Use the negative range of the integer values.
  2. Change the int data type to be bigint.

Other options are available (maybe the use of a GUID), but the scope of this article is these two common solutions.

Use the Negative Range of Integer Values

Probably because the default is to seed at one, the majority of identity columns I see start at one and increase into the positive integer range, meaning half the available values are wasted. This option is to re-seed the affected column at -2,147,483,648, allowing in excess of 2 billion extra inserts.

Pros

  • Column data types do not have to change.
  • Database code data types within stored procedures, functions, triggers etc. do not have to change.
  • Front end code and ETL pipelines do not have to change to handle a different data type.
  • Perhaps a good choice if you believe the remaining negative values will see the software reach it’s end of life.

Cons

  • Good practice in software development says no meaning should ever be given to surrogate keys. This is not always the case however, so if your software assumes that the chronology of inserts was in identity order then this code will break post change. All software would have to be checked for such logic. An example might be an incremental export to a data warehouse based on the identity column due to the lack of an alternative filter column in the OLTP source. If the upper bound was X yesterday and today we are importing all rows where the identity column is greater than X then this logic will have to change.
  • Full regression test would be required.
  • This negative portion may eventually run out too and the change to bigint would have to be made anyway.

Change to bigint

Pros

  • Only one change is required. Once this project is successfully delivered there are 9,223,372,036,854,775,807 available identity values.

Cons

  • All affected database columns have to be changed to bigint.
  • Foreign key relationships have to be dropped and re-created.
  • All affected indexes have to be rebuilt. This includes non clustered indexes where the data type of the clustering key is changing.
  • All affected data types within SQL Server code have to be changed.
  • All relevant data types within front end code have to be changed.
  • ETL pipelines to external systems will have to change where appropriate.
  • Any data warehouses or other external systems that store this column will have to change.
  • Full regression test would be required. Automated database testing can greatly assist with this task. The tSQLt testing framework could be used for example to test the execution of all affected code.
  • The database size will grow. Each identity now takes an additional four bytes of space. This increases further where the identity column is the cluster key that is stored in each non-clustered index. This will affect backup size, duration of maintenance routines etc.

Conclusion

  • Regularly and automatically monitor the maximum values in transactional tables with the identity attribute.
  • Immediately implement the action plan above if any are suspected of reaching the maximum value.
  • Plan for growth at the outset of future projects. Is the int data type really appropriate?
  • If using an int data type, can you seed your identity at the minimum negative value of -2,147,483,648 at the start of the project?
  • Don’t provide meaning to surrogate keys when writing T-SQL. Examples of providing meaning would be assuming surrogate keys are inserted chronologically and hard-coding surrogate keys in where clause predicates.
  • Consider using automated testing of database code. Regularly writing tests will produce good code coverage over time that will greatly assist in regression testing in projects such as these.
  • If you identify this as an issue in your organisation early and the use of the negative range would only be a temporary fix, then it is best to select the bigint solution.
  • Where there is great urgency to deliver a solution if the maximum int value is imminent, you may decide to use the negative int range as a temporary solution to avoid a significant software rewrite.
  • Another reason to use the negative range might be you believe that solution will last for years and possibly the lifetime of the database.