How in-memory ColumnStore Indexes change the way you have to size database servers

Written by Gavin Payne | 24-Dec-2014 16:04:54

In-memory ColumnStore Indexes in SQL Server 2014 can be a performance game changer for workloads involving large datasets. They can also significantly affect the amount of server resource needed to provide good query performance – but not always in the ways you’d expect.

A familiar database server performance issue

It’s common to see a transactional system’s database with lots of relatively small tables and a small number of very large tables - history logs or archives. The activities for those large tables is often mostly small inserts with occasional large read-only queries; queries that are often the database server’s most expensive workloads using all of their processor and memory resources. Maintaining consistent application performance when there are extreme variations in workloads is a common problem a lot of organisations have unsuccessfully tried solving by just adding more server resources.

SQL Server ColumnStore indexes

New in SQL Server 2012 was one of Microsoft’s transactional in-memory technologies, ColumnStore indexes. For the purpose of this article, they’re an alternative method for storing and querying very large amounts of SQL Server table data that in the right situation is a lot more efficient than SQL Server’s traditional methods. In SQL Server 2012, these indexes weren’t updatable which mostly limited their use to data warehouse scenarios. In SQL Server 2014, they became updatable allowing their use to spread beyond the read-only world and towards transactional systems.

The link below is to a presentation I gave two years ago about the then recently released ColumnStore technology: here

ColumnStore index proof of concept – Starting point

Recently, I was helping a client evaluate the new features in SQL Server 2014 including ColumnStore indexes. One of their application systems had exactly the problem I described above, specifically the following environment:

Table rows	Table size	Server CPU resources	Server memory
~1 billion	~36GB	2 vCPUs Medium usage	32GB High usage

Test query run time : ~30 seconds

A common application activity involved querying the billion row table that SQL Server performed using an expensive parallel clustered index scan. Consequently, the server’s storage was used heavily and the database server’s memory was fairly ineffective . Perhaps surprisingly, the server’s CPU usage was always fairly low.

This style of this workload made it a good candidate for trialling SQL Server 2014’s Clustered ColumnStore Index feature so we proceeded with a proof of concept.

ColumnStore index proof of concept – Testing

Having converted the large table from a traditional clustered index to a ColumnStore clustered index, we had the following environment:

Table rows	Table size	Server CPU resources	Server memory
~1 billion	~3GB	2 vCPUs V. high usage	32GB Low usage

Test query run time : ~200 seconds

The table above shows four observations significant to this PoC:

Table size dropped massively
Query run time increased massively
SQL Server’s CPU workload increased massively
SQL Server’s memory usage drop massively

ColumnStore index proof of concept – Tuning

Using the above observations, we concluded that:

Query performance was no longer being affected by memory availability or storage performance
The Batch Mode query processing used by ColumnStore indexes needed a lot more CPU resource

For the purpose of this PoC, we then made the following changes:

Table rows	Table size	Server CPU resources	Server memory
~1 billion	~3GB	8 vCPUs High usage	6GB High usage

Test query run time : ~5 seconds

The table above shows four observations significant to this PoC:

Query run time reduced massively from both the traditional and first ColumnStore tests
CPU usage was still high with 8 vCPUs, but 8 was the point of diminishing returns
SQL Server needed very little memory

ColumnStore index proof of concept – Conclusions

The PoC was a very fast paced trial of a new SQL Server feature with the objective of understanding its potential relevance to the client’s environment. However, our observations went beyond the “you’ll need less disk space” conclusion we expected. Instead, what we learnt was that:

The performance gains with the right amount of CPU resource were far greater than expected
Migrating to ColumnStore indexes and their Batch Mode queries moved large read-only queries from being memory bound to CPU bound
Host server CPU resource availability for the virtual database servers would become an important deployment consideration again after several years of being able to assume SQL Server was a low CPU workload
Using ColumnStore indexes wouldn’t be a quick fix in an existing transactional database design with referential integrity in place

ColumnStore indexes are a great tool in the toolbox when looking to solve performance issues; however, their performance doesn’t depend on storage and memory any more – it depends on something potentially far more expensive – CPU resource.

View full post