UNION ALL returning incorrect results?
I recently came across an interesting behaviour when executing queries that included a UNION ALL statement, the examples below were created using SQL Server 2016 Service Pack 1.
The statement used a UNION ALL statement to combine two SELECT queries from different tables and aggregate the data; when the order of the SELECT statements was reversed however, the data outputted by SQL Server changed. If you’re familiar with Set Theory this should not be possible, I have created example data below to show how this was happening:
First I created two tables, one to track physical purchases and another to track digital purchases of our customers, I then inserted some values into each table to use as example data:
CREATE TABLE dbo.DigitalPurchases ( ); CREATE TABLE dbo.PhysicalPurchases ( |
It’s important to note here that the columns “Price” and “Quantity” have been deliberately mixed up in that in the DigitalPurchases table the column order is “Price” then “Quantity” whereas in PhysicalPurchases the order is “Quantity” then “Price”.
|
After inserting the values above into the two tables we can start joining this data together and find the aggregated data that we’re looking for using a derived table, for example if we want to find the total quantity of products sold:
SELECT SUM(CombinedPurchases.quantity) |
Or swapping the select statements as it shouldn’t matter which order we combine the data in:
SELECT SUM(CombinedPurchases.quantity) |
Firstly, it’s odd that these two queries returned different results but secondly, we can see from the data that the returned results are also both incorrect as the actual total quantity of our sold items is 13.
So why is this happening? Well, when we are combining our two queries using SELECT *, SQL Server is comparing the columns in the order that we originally created the tables in. This means that the “Price” column of the first query is being matched to the “Quantity” of the second query and vice versa.
To correct this, we should always list the columns that we want to match deterministically as below:
SELECT SUM(CombinedPurchases.quantity) |
We can also verify that the correct data is being outputted by running the inner UNION ALL query which will now return the expected data:
This may seem like an edge case however when reference tables are being used as part of a database design to connect information together it becomes much easier for columns to be misaligned and produce incorrect results as you can see above.
Struggling with a tricky SQL Server issue? We can diagnose problems and optimise performance with our Data Platform Healthcheck: