Making SQL sense

+44 (0)20 3051 3595 info@coeo.com
coeoClose

Making SQL sense

+44 (0)20 3051 3595 info@coeo.com

Careers

We're looking for people who share our commitment to excellence in Microsoft's data platform to join us consultants working on exciting business intelligence, analytics, and SQL Server projects on-premises and in the cloud.

Executing Packages with SSIS Scale Out

The Coeo Blog

In my previous post Getting Started with SSIS Scale Out I discussed the use cases for SSIS Scale Out and how to install and configure it. In this post I'll look at how to execute packages in Scale Out, depending on your type of workload and security considerations.

Scheduling SSIS package execution is most commonly done using SQL Server Agent jobs. You can either use a job step of type "SQL Server Integration Services Package" to select a package to execute, or you can use T-SQL to call the catalog stored procedures catalog.create_execution, catalog.set_execution_paramters, and catalog.start_execution. The first option is probably the easier to set up, and is much more common, but it won't always be able to give you the flexibility you need when executing in scale out.

If you have lots of disparate processes and packages that are unrelated and independent of each other, then enabling scale out will work for you without having to change anything on your existing jobs.

However, there are two key limitations in just using the SQL Agent to execute packages directly.

  • There isn't a way in the SQL Server agent job to configure a specific job step to run in scale out or to have it execute on a specific worker. To do that you need to set the Server-wide Default execution mode to Scale Out in the SSISDB Catalog Properties, and then all of your jobs will run in scale out on any of your configured workers.
  • If your packages use the Execute Package Task (common where you have a master package executing lots of child packages) there is no option to execute the package in scale out meaning that all the child packages will execute on the same worker that the master package is executed on.

If you are using master packages to handle execution of a set of packages you will probably have two main drivers; simplicity of management, and parallel execution. You can move directly to using scale out and retain both of these things, but you won't necessarily gain anything. If your workload is a single end to end load, all handled by a single master package (for example a data warehouse ETL load that pulls data from multiple source tables, stages the data, transforms the data and then loads to the destination), then all you are going to achieve is moving the same workload to a different server.

In order to benefit from scale out you would need to execute each package separately. If you do that in a single agent job with multiple steps you will lose your parallelism. You could have multiple jobs all executed at the same time, but that just feels wrong, and will get hard to maintain and monitor, especially if you have lots of agent jobs already.

The other important issue is around security. I'm not necessarily comfortable with an all or nothing approach to enabling scale out and workers server wide. I don't necessarily want all my workers to have access to all the resources that any of the packages might need. I might want to group them into sets of workers or have one or two workers really locked down with minimal permissions on very specific resources. Those workers can even be in a different domain as long as you've set the permissions up correctly in the packages that you're going to execute.

If you have these types of workload then you're going to have to use T-SQL to execute your packages.

EXEC [SSISDB].[catalog].[create_execution]
     @package_name=MyPackageName, @execution_id=@execution_id OUTPUT, @folder_name=MyFolderName, @project_name=MyProjectName,
     @use32bitruntime=False, @reference_id=MyEnvironmentId,
     @useanyworker= @useanyworker, @runinscaleout=True
EXEC [SSISDB].[catalog].[start_execution] @execution_id, @retry_count=0
 

Notice the two new parameters that are available, @runsinscaleout and @useanyworker. In this example @useanyworker is true, but you can specify a particular worker to use using the add_execution_worker stored procedure:

EXEC [SSISDB].[catalog].[create_execution]
     @package_name=MyPackageName, @execution_id=@execution_id OUTPUT, @folder_name=MyFolderName, @project_name=MyProjectName,
     @use32bitruntime=False, @reference_id=MyEnvironmentId,
     @useanyworker= @useanyworker, @runinscaleout=True
EXEC [SSISDB].[catalog].[add_execution_worker] @execution_id,  @workeragent_id='DB600260-E9D6-49EB-881A-9DECE223C1D2'
EXEC [SSISDB].[catalog].[start_execution] @execution_id, @retry_count=0
 

You can even add multiple workers by calling the add_execution_worker procedure as many times as you need to.

When called like this, the package is executed asynchronously. This means that if you have multiple packages, you can execute them one after the other, and they will execute without waiting for the previous one to complete.

You probably don't want to hard code the package names, especially if there are a lot of them, and the code is going to grow and grow to an unmanageable mess after a while.So rather than do that, you should build a list of packages and then loop through it executing each one. Yes, I'm going to use a cursor, this being a suitable use case for one!

CREATE TABLE #Packages
(
     [PackageName] [nvarchar](255) NOT NULL
);
INSERT INTO #Packages(PackageName)
VALUES ('MyPackage1'), ('MyPackage2'), ('MyPackage3');
 
DECLARE Packages CURSOR LOCAL FAST_FORWARD FOR
     SELECT PackageName FROM #Packages ORDER BY PackageName;
OPEN Packages

FETCH NEXT FROM Packages
INTO @PackageName

WHILE @@FETCH_STATUS = 0
BEGIN
     EXEC [SSISDB].[catalog].[create_execution] 
          @package_name=MyPackageName, @execution_id=@execution_id OUTPUT, @folder_name=MyFolderName,
          @project_name=MyProjectName, @use32bitruntime=False, @reference_id=MyEnvironmentId,
          @useanyworker= @useanyworker, @runinscaleout=True
     EXEC [SSISDB].[catalog].[start_execution] @execution_id, @retry_count=0

     FETCH NEXT FROM Packages
     INTO @PackageName
END

CLOSE Packages
DEALLOCATE Packages
 

I would go one step further and build an ETL control table, create a stored procedure for my package execution code and then call it with a parameter. Now I only have one piece of code to maintain, every package execution is consistent, and new packages can be added and scheduled with minimal overhead. You can see the full stored procedure here.

Happy scaling!

 

 

Subscribe to Email Updates