The Problem
Not too long ago I was assisting a client with an issue they were experiencing whilst running some compliance software to check for sensitive data. Sensitive data was picked up in the SQL Server data file but when the DBAs checked, the data was nowhere to be found in the database.
It's fairly widely known that just because you delete data from a database does not mean it is gone from the data file itself, but the question was how should they go about removing it? They had already tried shrinking the file and executing sp_clean_db_free_space but it had made no difference.
Deleting Rows vs Dropping Columns
The procedure sp_clean_db_free_space removes ghost records; rows that have been marked as deleted but not physically removed. This process runs periodically anyway but executing this stored procedure will manually set off the clean-up. You can check how many ghost records you have by running the query below. If its high be careful as there was a bug in earlier versions of SQL Server that meant the automatic clean-up was not working.
SELECT * from sys.dm_db_index_physical_stats(DB_ID(),NULL,NULL,NULL,’DETAILED’)
In this case, however, the ghost data was low (less than 100) and the amount of sensitive data was large compared to it. It transpired that the data was in a column that was dropped, and data removed by dropping a column behaves very differently than when a row is deleted. Because the data in a table is stored in rows, marking a whole row for removal is much easier than marking a column in every row. When you drop column it can be quick, almost instant, this is because it is only marked for removal at the meta data level. As it is done in this way the data is not marked as a ghost record.
The Options
If you have Enterprise Edition you can enable (and then disable if you wish) TDE, which is a good option to wipe the data out. Enabling TDE causes all the pages to be read and encrypted, which won't account for data outside of this file but will do a pretty good job as far as the database is concerned.
If you don't have Enterprise Edition however there are other options. The first would be to select all the data into a new table and then drop the old one. The new table could then be renamed back to the original one. This will again leave ghost records but these will be cleaned out.
A slightly less inconvenient option would be to rebuild the clustered index before running sp_clean_db_free_space. This appears to remove the data in most situations and also applies to rows that have been deleted that appeared in a non clustered index. During the rebuild all the pages are read / written and there is plenty of data movement so it makes sense this would work. In the case of this particular client the column had been dropped from a heap table so a clustered index had to be added and then dropped, but with the same outcome. Unfortunately if you don't know the original table this would mean rebuilding all of them before running the clean (unless you’re a whiz at reading MDFs).
While this method did work in this particular case it's not guaranteed, there is always a remote possibility that a page is not moved or rewritten and the data stays in place. It does however highlight the way the internals are handled when dropping data. This also does not cover the rest of the disk the database sits on; if you were to drop the database or remove a file there may be residual data here too and the only way to do it would be to scrub the device securely.
Example
Below is the script if you'd like to try these out for yourself. To look for the data in the mdf you need to take the database offline then copy the file somewhere to open it. This example was run in 2016 but it should work in earlier versions too.
/*****************************************************************
* Rollback
*****************************************************************/
use master
GO
ALTER DATABASE [CleanDataTest] SET ONLINE
GO
ALTER DATABASE [CleanDataTest] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
DROP DATABASE [CleanDataTest]
GO
/*****************************************************************
* Initial Setup
*****************************************************************/
use master
GO
CREATE DATABASE [CleanDataTest]
GO
USE [CleanDataTest]
GO
CREATE TABLE [CleanDataTestTableRow] (ID INT IDENTITY, SensitiveData VARCHAR(10))
GO
CREATE TABLE [CleanDataTestTableColumn] (ID INT IDENTITY, SensitiveData VARCHAR(10))
GO
INSERT INTO [CleanDataTestTableRow]
SELECT 'ABC123'
GO 10
INSERT INTO [CleanDataTestTableColumn]
SELECT 'XYZ456'
GO 10
DELETE FROM [CleanDataTestTableRow]
GO
ALTER TABLE [CleanDataTestTableColumn] DROP COLUMN SensitiveData
GO
--If you Open the mdf file in notepad (you may have to copy / paste it first) you should find both ABC123 and XYZ456
--ALTER DATABASE [CleanDataTest] SET OFFLINE WITH ROLLBACK IMMEDIATE
--GO
/*****************************************************************
* Remove rows
*****************************************************************/
sp_clean_db_free_space 'CleanDataTest'
GO
--If you Open the mdf file in notepad (you may have to copy / paste it first) you should find just XYZ456
--ALTER DATABASE [CleanDataTest] SET OFFLINE WITH ROLLBACK IMMEDIATE
--GO
/*****************************************************************
* Remove columns using index rebuild or add/ drop on a heap
*****************************************************************/
ALTER TABLE [CleanDataTestTableColumn] ADD CONSTRAINT PK_CleanDataTestTableColumn_ID PRIMARY KEY(ID)
GO
ALTER TABLE [CleanDataTestTableColumn] DROP CONSTRAINT PK_CleanDataTestTableColumn_ID
GO
sp_clean_db_free_space 'CleanDataTest'
GO
--If you Open the mdf file in notepad (you may have to copy / paste it first) you should not find the data
--ALTER DATABASE [CleanDataTest] SET OFFLINE WITH ROLLBACK IMMEDIATE
--GO
/*****************************************************************
* Remove both using TDE
*****************************************************************/
USE master
GO
--CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'abcxyz123456'
IF NOT EXISTS(select * from sys.certificates where name = 'TestCert')
CREATE CERTIFICATE TestCert WITH SUBJECT = 'Test Certificate'
GO
USE [CleanDataTest]
GO
CREATE DATABASE ENCRYPTION KEY WITH ALGORITHM = AES_128 ENCRYPTION BY SERVER CERTIFICATE TestCert;
GO
ALTER DATABASE [CleanDataTest] SET ENCRYPTION ON
GO
--If you Open the mdf file in notepad (you may have to copy / paste it first) you should not find the data
--ALTER DATABASE [CleanDataTest] SET OFFLINE WITH ROLLBACK IMMEDIATE
--GO