I'm mulling over some design decisions that will probably result in my using Table-Valued Parameters (TVPs) for the first time. Let's see what you think.
Please note: this post applies to SQL 2008 and above. No real differences in the restrictions or behavior in SQL 2012, AFAICS.
The problem to solve
We have a complex SSIS flow that handles batch imports, with a large number of validation steps on staging tables before the eventual consumption of the correctly transformed, and correctly reduced, set of rows by production tables.
For various reasons, I need to reproduce this logic without using SSIS, and without using staging tables, in some circumstances.
An external application will invoke a single "master process" method, which in turn will invoke all the logic necessary to handle the original SSIS behavior.
I want the sequence of steps to remain faithful to the original SSIS process, to ensure that data can be consumed both ways with the same results.
For the purpose of performance considerations, please understand that the SSIS process handles large-scale imports, whereas this new one is light-weight. I don't have to worry about bulk processing in the new design. In fact, the expected use cases for it are real-time integrations with only a single row being passed in -- although I'd like my methodology to handle multiple rows, in small numbers, as well.
I like the idea of creating a custom or user-defined table type for this purpose. Rather than manipulating a staging table, it seems ideal to move the data into a variable shaped like that staging table, passing it down the line to additional steps, massaging it as I go, until eventually the TVP is ready for upsert.
What's the big deal?
When you pass a TVP into a called function or a called sproc, however, you must use the READONLY keyword. You can't massage the contents of the TVP in the called code and then see the changes reflected in the parent procedure.
I guess another way to say this (if you're a .NET programmer) is that it's not like passing by reference in the .NET world -- even though you are passing by reference in the sense that a copy of the data is not made, and you do reap perf benefits from that fact ordinarily. (As you will see, I'm probably not going to be reaping that benefit in this solution.)
Still another way to say this (if you're a T-SQL guy) is "you can't declare a TVP as an OUT param or, if you could, it wouldn't make any difference."
So what's the best way to resolve this?
Moving a TVP down a line
At first I thought I would declare and fill the TVP and then pass it into a series of functions. Each function will do what it has to do, and return a TVP of the same type back. Naturally the return value, declared by and manipulated within the function, is readwrite for the function. The master proc would have code something like this:
-- use a function to update for each step,
-- like this:
SET col2 = tvp2.col2
FROM dbo.TestUpdateTVPByFunction(@TestTVPInstance) tvp2
ON tvp1.col1 = tvp2.col1;
There are two problems with this approach:
1 - There are a lot of things you just can't do in functions, including dynamic SQL and using temp tables. Each step in the current process does a lot of "stuff", in many cases using techniques not allowed in functions. The further I modify the original code for the new process from the original, the more difficult it will be to get exactly consistent results.
2 - Looking at the above code, what you don't see is the (large) number of columns to be UPDATEd in the real code. Given the (large) number of steps there are all together, each with its clumsy UPDATE statement, the master process starts to look like an ungainly mess. It will be really difficult to follow the flow. Contrast that with the nice visual presentation we had in the SSIS flow, and it's not a happy thought. I want some structure in the code that gives you something of the sense of the process, just as the SSIS Designer provides it. Well, near as I can get.
So, I changed it up.
At the moment, I'm looking at a much cleaner presentation in the master proc, using a series of TVPs in the master procedure, each of which is the input to one step and results in the filling of another one. The code looks like this at the top:
DECLARE @TestTVPInstance as dbo.TestTVP,
@TestTVPInstance_2 as dbo.TestTVP,
-- lots more here...
@TestTVPInstance_N as dbo.TestTVP;
The first step fills the first TVP using the input data, which is coming in from the external application, and of course is not a TVP itself (it's actually XML, but that's another blog post).
Thereafter, the procedure's steps look something like this (with more descriptive proc names, obviously) ;
-- use a procedure to manipulate
-- and bring back into a second copy:
INSERT INTO @TestTVPInstance_2
EXEC dbo.TestUpdateTVPByProc @TestTVPInstance;
-- many, many more such calls... do lots of work...
INSERT INTO @TestTVPInstance_N
EXEC dbo.TestUpdateTVPByProc_N @TestTVPInstance_N_minus_1;
-- followed by some work done to update a "real"
-- table using the final intermediate instance:
EXEC dbo.UpdateSomeTableWithComplicatedLogic @TestTVPInstance_N ;
As you can see, the master proc now has the potential to be neat and clear. If each of the sprocs were named in a way that mirrors the task names in the SSIS task flow you would also be able to see the correlation very clearly.
In the words of the masters...
That's so crazy it just might work.
Honestly, I'm still turning this around. I don't know if it is a really obvious idea, and everybody else does this already to use TVPs with some fluency, to get around the READONLY limitation, or if it's a really bad idea, and nobody else would do it.
It doesn't get around all the limitations of TVPs; you can't dot-qualify them so that you can use them across databases.
But it seems like it allows TVPs to function, in more scenarios, more the way I think they should.
What's your take?