Please wait your turn: A basic Introduction to Latches in SQL Server

Following on from my recent blog post on locking, a level deeper in the concurrency chain leads us to latches. A lot of people will have heard of them or will have seen the related waits within SQL, for example in the sys.dm_os_wait_stats DMV, but, I would wager, not many people could give me a good explanation as to what they are. I am hoping this gentle introduction will help you on the road to understanding.

Why are they necessary?

To start with, a basic definition: ‘Latches are lightweight synchronization objects, that are used by the storage engine of SQL Server to protect the internal memory structures’. Compare this with locks in SQL server, which are a transaction level construct to manage concurrency, latches work at the thread level to maintain data integrity within the internal memory structures. They are not exposed outside of the SQL Server Operating System (SQLOS). They are only managed by SQL Server itself, not by users (unlike locks that can be overridden via lock hints or changing isolation level). It is useful to keep in mind that a single transaction can use multiple threads at the same time.

To make that point again, consistency cannot be achieved in SQL Server with locking alone. SQL Server must access shared data structures that are not protected by the lock manager. Latches are the means for managing how the threads access the data structures. They are an internal SQL Server mechanism that the user cannot directly modify. The below example will help solidify what can go wrong if latches were not present:

Take the case where two threads are wanting to update the same page. Thread 1 will read the page header and find the next available free space in the page. Meanwhile thread 2 comes along and reads the header and is pointed to the same free space. Thread 1 writes to the space and updates the header. Unfortunately, this is too late as thread 2 is already targeting the same slot and overwrites the data that thread 1 has just written. This results in the data from thread 1 being lost. Using latches prevents this and other problems from occurring.

Types of Latch

In SQL Server there are three types of latches – IO Latches, buffer latches and non-buffer latches.

Let us have a look in more detail at those three different variations:

I/O latches happen when I/O operations against pages in the buffer pool are performed, when data is read from or written to the storage subsystem. These latches generate wait types starting with PAGEIOLATCH_*. They make sure that pages are not read concurrently multiple times into the buffer pool and that pages are not discarded from the buffer pool when an active transaction is still using them.

Buffer latches are used to protect the pages in the buffer from concurrently running threads. They are used to protect against lost updates, which can occur if two threads try to concurrently write to the same free space in a page, as only the latest operation would be maintained. Without these latches, it would be possible to read and write to a page concurrently in the buffer. These latches generate wait times starting with PAGELATCH_* (notice the difference, no IO in the middle).

If a page is modified in the buffer pool, it is called a “dirty” page, as it differs to the version of the page in hard storage, while the unmodified pages are called “clean” pages. When the page is accessed in the buffer pool, the SQLOS will acquire a buffer latch on that page. Unlike with locks, the SQL Server latch will not be held for the transaction duration; it will be released as soon as the thread has finished its work on that resource.

Non-Buffer Latches are used internally by SQL Server to protect shared data structures besides the buffer pool itself. These latches generate wait times starting with LATCH_*; however, this wait type is just a summary view of all the individual latches that SQL Server uses internally. A further breakdown can be found in the sys.dm_os_latch_stats DMV. You might see non-buffer latches being used in the following cases: excessive parallelism or too many autogrow/shrink events.

Knowing a little bit about each of these latch types is extremely useful, especially if you are using wait stats to troubleshoot a performance issue. It will enable you to have an understanding around which part of the process is problematic and whether there could be latch contention going on.

As with locks, SQL Server latches can come in various modes:

Destroy Latch (DT): Acquired when a latch is destroyed, and a page is to be removed from the buffer pool. DT latches are the most restrictive and block everything else.

Exclusive Latch (EX): Acquired when a page is being written to and prevents other locks being taken on that page.

Update Latch (UP): Allows read operations to access the page but blocks any other write operations.

Keep Latch (KP): Acquired when a page in the buffer pool is about to have a new latch placed on it and needs to be protected.

Shared Latch (SH): Acquired when a read is performed against a page.

Certain latch modes are incompatible with each other, and the below table summarises the compatibility between the different latch modes:

	KP	SH	UP	EX	DT
KP	Compatible	Compatible	Compatible	Compatible	Non- Compatible
SH	Compatible	Compatible	Compatible	Non- Compatible	Non- Compatible
UP	Compatible	Compatible	Non- Compatible	Non- Compatible	Non- Compatible
EX	Compatible	Non- Compatible	Non- Compatible	Non- Compatible	Non- Compatible
DT	Non- Compatible	Non- Compatible	Non- Compatible	Non- Compatible	Non- Compatible

SuperLatches

Superlatches (also called sublatches) aim to improve SQL Server efficiency in highly concurrent OLTP workloads. They are a mechanism to deal with latch contention and hot points by dynamically promoting latch arrays to a superlatch. This means that resources will still request latches in the usual way, but the latch will point to a different value in the array for each scheduler, meaning they only have to query the local state rather than the global state.

Latch Contention

Latch contention can be seen more frequently in systems with a large number of CPUs, and it occurs when multiple threads are trying to get incompatible latches on an internal memory structure. Remember latches are controlled internally, SQLOS will determine on its own when to use them. That does not mean you are not to blame, the way that you structure your databases and applications can still make their behaviour better or worse.

It makes sense that it is more of a problem for a system with a large number of CPU’s, because those systems will have more threads available and it is the threads that are competing for the latches. This blog does not have the scope to dive into the details of different types of latch contention, of which there are many, neither does it have the scope to take you through identifying it. However, if you are in the situation where you are increasing CPU but getting worse performance and a lower transaction/sec rate, there is a good chance you are suffering from latch contention.

Latch contention is mainly caused in high concurrency situations and below are a few common causes:

Application design: if an application issues a high number of concurrent requests against the same database or tables.

SQL Server logical files layout: This is a common one especially for tempdb contention. If PFS, SGAM, or GAM pages are constantly hit, you could see contention. This is seen quite often as a result of not having enough tempdb files or trying to create too many temp tables in tempdb.

The performance of I/O subsystems: If it takes a long time to bring things in and out of the buffer pool, it can become a contention point.

Where can you see latches

The Sys.dm_os_wait_stats which gives you historical information on the amount of waits and the max time for an occurrence of that wait, since the last time the server was restarted or the cache was flushed.

The Sys.dm_os_latch_stats dmv which gives you historical information on each latch related wait type, it breaks the Non-buffer related wait stats down further than Sys.dm_os_wait_stats does. Likewise reset on server restart.

The Sys.dm_exec_requests dmv contains live information on what is currently executing on the server, it provides info on what waits the processes are currently experiencing as well as parallelism information.

The sys.dm_os_waiting_tasks dmv contains live information about any tasks that are currently waiting.

There are also a number of extended events that can be used to capture information on latches:

In the performance monitor, there is also a SQLServer:Latches counter that provides a number of fields which return information on latches.

The SQL error log also captures some latch related error messages.

Summary

Overall, I hope you learnt a little something about latches today and strengthened the foundations of your knowledge. Latches are part of SQL server, and they are going to be used whether you like it or not (and you should like it). They are used as a concurrency protection method to prevent multiple threads modifying the same memory structure at the same time and come in three primary forms: Io, buffer, and non-buffer. Thank you for reading.

I look forward to seeing you on my next post.

Please wait your turn: A basic Introduction to Latches in SQL Server

The Coeo Blog

Why are they necessary?

Types of Latch

SuperLatches

Latch Contention

Where can you see latches

Summary

Subscribe to Email Updates

Related posts

Troubleshooting Ola Hallengren’s Maintenance Solution

Domain-Independent Windows Failover Cluster for SQL Server AlwaysOn Availability Group

SQLBits 2022 session - Field Testing Ola Hallengren’s Maintenance Solution

Windows Active Directory Detached Cluster with SQL Server Basic Availability Group