SQL Server 2014 SP 2 introduced an interesting feature to provide a data free copy of a production database.
A little late to the party, I was using SQL Server 2016 CU1 before starting my investigation. Nevertheless, I was excited to see what DBCC CLONEDATABASE had to offer. Typing out this fresh syntax and clicking F5 gave:
Incorrect DBCC statement. Check the documentation for the correct DBCC syntax and options.
#sqlhelp
The unwelcome red text informed me of a typo. After double and triple checking, I did what any good DBA does and took to Twitter:
Ever #helpful, Twitter soon notified me that the command had been removed from the product in SQL Server 2016 RTM.
SP1
Fast forward 3 short months and the release of SP1 (https://www.microsoft.com/en-us/download/details.aspx?id=54276) for SQL Server 2016. In addition to many appealing announcements, DBCC CLONEDATABASE (https://support.microsoft.com/en-gb/kb/3177838) made a welcome return.
DBCC CLONEDATABASE will create a copy of a database without the data but containing data distribution statistics.
Create Statistics
To demonstrate, I will create 1,000,001 customers, 1 of whom is inactive, and a stored procedure to select them.
USE CloneDemo;
DROP TABLE IF EXISTS dbo.Customer;
GO
CREATE TABLE dbo.Customer(
CustomerID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_Customer PRIMARY KEY CLUSTERED,
Active CHAR(1) NOT NULL,
Created DATE NOT NULL);
GO
INSERT INTO dbo.Customer(Active,Created)
SELECT TOP 1000000 'Y','20160101'
FROM DBA.dbo.Numbers n1 CROSS JOIN DBA.dbo.Numbers n2;
GO
INSERT INTO dbo.Customer(Active,Created) VALUES ('N','20160101');
GO
CREATE NONCLUSTERED INDEX ix_Customer_Active ON dbo.Customer (Active);
GO
DROP PROCEDURE IF EXISTS dbo.CustomerSelect;
GO
CREATE PROCEDURE dbo.CustomerSelect (@Active CHAR(1)) AS
SELECT CustomerID,
Active,
Created
FROM dbo.Customer pp
WHERE pp.Active = @Active
GO
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
EXECUTE dbo.CustomerSelect @Active = 'Y';
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
Clone
A copy of CloneDemo is created via
DBCC CLONEDATABASE ('CloneDemo','CloneDemo_Copy');
Resulting in a read-only database named CloneDemo_Copy and successful output of:
Note the warning "not supported for use in a production environment". It is a good idea to backup and restore the cloned database to a non-production server for further analysis.
Statistics Retained
Repeating the customer select query
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
EXECUTE dbo.CustomerSelect @Active = 'Y';
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
DBCC SHOW_STATISTICS
USE CloneDemo_Copy;DBCC SHOW_STATISTICS('dbo.Customer','ix_Customer_Active');
Viewing the statistics in the cloned database reveals the production data distribution even though no data is present
No data means some data
Although no user data exists in the cloned database, good security practice still needs to be implemented.
Table and column names could be vital information for a would-be hacker to attempt a data breach elsewhere.
The bucket values within statistics contain the actual data values. In our example, this is simply 'N' or 'Y' to indicate customer status. This could just as easily be our customer names or email addresses so data governance for cloned databases should be as strict as elsewhere.
Use cases
From https://support.microsoft.com/en-gb/kb/3177838, "DBCC CLONEDATABASE should be used to create a schema and statistics only copy of a production database in order to investigate query performance issues."
You should proceed with caution before using this method for others means. Objects are copied from the source database with some limitations as listed at the same link. An example is "Encrypted objects get copied but aren’t supported in clone in this release of SQL Server."
If production statistics are not required, my preference for creating a database with no data would be a "get" from a current source code control system.
Nevertheless, some reasons why cloning a database might prove useful are:
- Test query execution plans in non-production environments using representative statistics.
- To provide developers with a copy of the live schema without sensitive data. This could be combined with a third-party tool to generate realistic test data after making the database read-write.
- Create multiple test and development environments where disk space is at a premium.
- Run Microsoft Data Migration Assistant (https://www.microsoft.com/en-us/download/details.aspx?id=53595) to test your schema for compatibility issues prior to migration.
- Comparison of the live schema to source code control.
- Testing schema modification scripts after making the database read-write.
Summary
DBCC CLONEDATABASE is a useful addition to the DBA toolset. It can quickly create a production schema copy without requiring large amounts of I/O, time and disk space.
The database schema can be copied without exposing the most valuable of assets, the data.