Long story short: I have a SSIS data transformation task which takes some data from a table and inserts it in a different table.
So far so good. But the problem is that the data source in my transformation relies on a query which gets data from a table with several million rows and in reality it takes some 10 seconds before the first data row gets into the SSIS flow.
Think about it: the SSIS itself does not start processing anything before the first chunk of data has arrived in the pipeline.
So, what should I do?
Solution: there is a query option which allows us to force a query to return a set of the first N rows as fast as possible and then the rest of the dataset.
Our query might look like this:
1 2 3
select column1 from Table1 OPTION (FAST 10)
This query will return the first 10 rows to the client almost instantly and then the rest of the rows will come as fast as the execution plan allows it.
Why is this important: it is important because if I do not use the OPTION (FAST N) my SSIS transformation will be waiting for the first data set and will not be processing anything. I.e. I have package run duration of query execution + SSIS tasks execution.
In the second case if I get the first set of data immediately then the SSIS package can start processing immediately and can potentially be much faster than the case when we do not use the query hint.
BUT, be careful because when you use a query hint OPTION (FAST N) the execution plan might be much slower than the one without the query hint. So, make sure to tune your queries and test the process.
Here is another problem, though, and for this one I need your help:
Let’s say that I have a CTE which gets my data into the SSIS pipeline. This works without a problem, but if I would like to apply the OPTION (FAST N) to the CTE it does not work quite well.
Let’s say I have a table and some data like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
-- Create an Employee table. CREATE TABLE dbo.MyEmployees ( EmployeeID smallint NOT NULL, FirstName nvarchar(30) NOT NULL, LastName nvarchar(40) NOT NULL, Title nvarchar(50) NOT NULL, DeptID smallint NOT NULL, ManagerID int NULL ); -- Populate the table with values. INSERT INTO dbo.MyEmployees ( EmployeeID ,FirstName ,LastName ,Title ,DeptID ,ManagerID ) SELECT 1, N’Ken’, N’Sánchez’, N’Chief Executive Officer’,16,NULL UNION ALL SELECT 273, N’Brian’, N’Welcker’, N’Vice President of Sales’,3,1 UNION ALL SELECT 274, N’Stephen’, N’Jiang’, N’North American Sales Manager’,3,273 UNION ALL SELECT 275, N’Michael’, N’Blythe’, N’Sales Representative’,3,274 UNION ALL SELECT 276, N’Linda’, N’Mitchell’, N’Sales Representative’,3,274 UNION ALL SELECT 285, N’Syed’, N’Abbas’, N’Pacific Sales Manager’,3,273 UNION ALL SELECT 286, N’Lynn’, N’Tsoflias’, N’Sales Representative’,3,285 UNION ALL SELECT 16, N’David’,N’Bradley’, N’Marketing Manager’, 4, 273 UNION ALL SELECT 23, N’Mary’, N’Gibson’, N’Marketing Specialist’, 4, 16 GO 1000000000 (ATTENTION HERE!!!)
And then let’s say that I would like to run the following simple CTE:
1 2 3 4 5 6 7 8
;WITH cte (EmployeeID, ManagerID, Title) as ( SELECT EmployeeID, ManagerID, Title FROM dbo.MyEmployees ) SELECT EmployeeID, ManagerID, Title FROM cte OPTION (FAST 1000)
Well, this does not work for me, since the inner part of the CTE is the one that gives me the delay, not the outer SELECT.
I can try to run this, but it will not work.
1 2 3 4 5 6 7 8
;WITH cte (EmployeeID, ManagerID, Title) as ( SELECT EmployeeID, ManagerID, Title FROM dbo.MyEmployees OPTION (FAST 1000) ) SELECT EmployeeID, ManagerID, Title FROM cte
What should I do? How do you return the first N rows of a CTE fast?