In-memory ColumnStore Indexes in SQL Server 2014 can be a performance game changer for workloads involving large datasets. They can also significantly affect the amount of server resource needed to provide good query performance – but not always in the ways you’d expect.
A familiar database server performance issue
It’s common to see a transactional system’s database with lots of relatively small tables and a small number of very large tables - history logs or archives. The activities for those large tables is often mostly small inserts with occasional large read-only queries; queries that are often the database server’s most expensive workloads using all of their processor and memory resources. Maintaining consistent application performance when there are extreme variations in workloads is a common problem a lot of organisations have unsuccessfully tried solving by just adding more server resources.
SQL Server ColumnStore indexes
New in SQL Server 2012 was one of Microsoft’s transactional in-memory technologies, ColumnStore indexes. For the purpose of this article, they’re an alternative method for storing and querying very large amounts of SQL Server table data that in the right situation is a lot more efficient than SQL Server’s traditional methods. In SQL Server 2012, these indexes weren’t updatable which mostly limited their use to data warehouse scenarios. In SQL Server 2014, they became updatable allowing their use to spread beyond the read-only world and towards transactional systems.
The link below is to a presentation I gave two years ago about the then recently released ColumnStore technology: here
ColumnStore index proof of concept – Starting point
Recently, I was helping a client evaluate the new features in SQL Server 2014 including ColumnStore indexes. One of their application systems had exactly the problem I described above, specifically the following environment:
Table rows |
Table size |
Server CPU resources |
Server memory |
|
~1 billion |
~36GB |
2 vCPUs |
32GB |
|
Test query run time : ~30 seconds
A common application activity involved querying the billion row table that SQL Server performed using an expensive parallel clustered index scan. Consequently, the server’s storage was used heavily and the database server’s memory was fairly ineffective . Perhaps surprisingly, the server’s CPU usage was always fairly low.
This style of this workload made it a good candidate for trialling SQL Server 2014’s Clustered ColumnStore Index feature so we proceeded with a proof of concept.
ColumnStore index proof of concept – Testing
Having converted the large table from a traditional clustered index to a ColumnStore clustered index, we had the following environment:
Table rows |
Table size |
Server CPU resources |
Server memory |
|
~1 billion |
~3GB |
2 vCPUs |
32GB |
|
Test query run time : ~200 seconds
The table above shows four observations significant to this PoC:
- Table size dropped massively
- Query run time increased massively
- SQL Server’s CPU workload increased massively
- SQL Server’s memory usage drop massively
ColumnStore index proof of concept – Tuning
Using the above observations, we concluded that:
- Query performance was no longer being affected by memory availability or storage performance
- The Batch Mode query processing used by ColumnStore indexes needed a lot more CPU resource
For the purpose of this PoC, we then made the following changes:
Table rows |
Table size |
Server CPU resources |
Server memory |
|
~1 billion |
~3GB |
8 vCPUs |
6GB |
|
Test query run time : ~5 seconds
The table above shows four observations significant to this PoC:
- Query run time reduced massively from both the traditional and first ColumnStore tests
- CPU usage was still high with 8 vCPUs, but 8 was the point of diminishing returns
- SQL Server needed very little memory
ColumnStore index proof of concept – Conclusions
The PoC was a very fast paced trial of a new SQL Server feature with the objective of understanding its potential relevance to the client’s environment. However, our observations went beyond the “you’ll need less disk space” conclusion we expected. Instead, what we learnt was that:
- The performance gains with the right amount of CPU resource were far greater than expected
- Migrating to ColumnStore indexes and their Batch Mode queries moved large read-only queries from being memory bound to CPU bound
- Host server CPU resource availability for the virtual database servers would become an important deployment consideration again after several years of being able to assume SQL Server was a low CPU workload
- Using ColumnStore indexes wouldn’t be a quick fix in an existing transactional database design with referential integrity in place
ColumnStore indexes are a great tool in the toolbox when looking to solve performance issues; however, their performance doesn’t depend on storage and memory any more – it depends on something potentially far more expensive – CPU resource.