One of the common customer queries we get at Coeo is how to provide high availability for SQL Server. I have been advocating the use of SQL Server Failover Cluster Instances (FCI) to provide high availability for over 10 years. However, during a review of our reference architecture the question arose, "is there any scenario where we would recommend an FCI today?".
Many people love FCI's as they are fairly easy to manage and well understood. Historically, they are a great way to host multiple SQL Server instances on physical hardware without incurring additional license charges. Although, now I would recommend OS virtualisation to provide better isolation, standardisation and control. FCI support all SQL Server features, components and have comprehensive support from third party vendors. This used to be the kicker and consequently FCI became the fall back highly available architecture choice for the last decade. With the launch of SQL Server 2017 (and SQL Server 2016 SP2) Microsoft now provides support for some critical features that make it possible to use Availability Groups (AG) in previously unsupported scenarios; examples include complete Distributed Transaction Coordinator (DTC) and SQL replication distributor role support. These were the final barriers to be eliminated by Microsoft this month (https://blogs.msdn.microsoft.com/sql_server_team/replication-enhancement-distribution-database-in-availability-group/).
Once a customer has upgraded to SQL Server 2016 SP2 or 2017, I can think of very few reasons to use FCI over AG. There is only one scenario I could imagine - SQL Servers with sustained IO write bandwidth greater than 660 MB/s as this can be a limiting factor for AG synchronous replication. For applications that only support older versions of SQL Server (2014 or earlier) then there is still a good case to use FCI. Availability Groups provide so many more benefits that should encourage customers to upgrade as soon as possible. Some of the advantages of AG over FCI are:
- AG typically fail over in under one second while FCI take at least 30 seconds
- AG do not require shared storage, so SQL Server can run on virtualised platforms and public cloud providers where shared storage is often unavailable
- Rolling hardware, OS and SQL Server upgrades without an outage
- AG can provide geographic high availability and disaster recovery without expensive storage replication solutions
With SQL Server 2012, Microsoft enabled FCI using SMB shares, but I find very few people using these as the SMB share must be highly available and that often requires shared storage which could more easily have been presented directly to the SQL Server cluster. With the advent of Windows Server 2016 Data Centre edition, Microsoft introduced a new technology called Storage Spaces Direct (S2D). S2D provides OS based storage replication so that local disks can be presented as shared storage known as cluster shared volumes. This makes it possible to deploy FCI on virtual platforms like Microsoft Azure with SQL Server 2012 or later. There are various reasons I would advise customers avoid using S2D which I will blog about soon. But, suffice to say, they are complicated to deploy and difficult to manage or monitor. Additionally, FCI do not support DTC in Azure because of limitations with the internal load balancer, which limits the possible usage scenarios https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sql/virtual-machines-windows-portal-sql-create-failover-cluster#limitations.
Back to the original question. Given that virtualised platforms like Microsoft Azure provide protection from hardware failure, are FCI still required? I don’t think so! If your applications matter to your organisation upgrade them to SQL Server 2016 SP2 or 2017 and use Availability Groups. Otherwise, run your applications on a standalone instance of SQL Server on a virtual platform to achieve hardware resilience.