For a while I had an SSIS package being unable to complete in the production environment but which worked perfectly in development. To make matters more interesting the package will not always fail… oh! “Welcome back!” That was what I was hearing in my head over and over again.
So I decided to investigate a bit and my findings led me to the following screenshot:
So here we have the problem. Production server was screaming ‘Not enough memory!’ like StarCraft game when saying ‘Not enough minerals’…
First reaction from my side: Let’s go and ask for more memory ^_^
On a second thought I dig a bit more and if you read the first entry in the log you will see that it mentions that the Sort Input component was the one complaining. So it turned out that there was not enough free memory to allocate half a million rows in memory to do a Sort transformation (needed as there are several Merge transformations on that SSIS package later). FYI, Sort transformations require to load all the records in memory to do their job, thus eliminating the benefits of the cascade architecture that SSIS is so proud of.
Ok, so now I have the troublemaker identified and with the help of computer IT’s best friend, Google, I found an ok solution to the problem: Use ORDER BY when querying your data source and configure the data source task to identify the ordering.
First thing that you need is to add an ORDER BY order to your SQL command when fetching your data (more on this later). So in my case it looks like SELECT …blabla… FROM X ORDER BY EBELN ASC. Being EBELN the field used in the Sort transformation.
Then close this task editor and right click on it and choose Show Advanced Editor…
In the Advanced Editor go to the ‘Input and Output Properties’ tab and click on OLE DB Source Output. Change ‘IsSorted’ property to True. This marks the output dataset as sorted.
Last but not least you need to tell which columns and in which order the dataset is ordered by. Open the Output Columns folder and set the ‘SortKeyPosition’ property to 1, 2, 3… for the columns included in the ORDER BY command. Note that 0 is the value by default and marks that column as not part of the ORDER BY.
As in my case I have only one column, EBELN, in the ORDER BY command I only need to set the SortKeyPosition property equal to 1 in the EBELN column. If your ORDER BY looks like ORDER BY EBELN, EKGRP, LIFNR you will set up the SortKeyPosition to 1 for EBELN, to 2 for EKGRP and to 3 for LIFNR.
Ok, now that I have explained how to the trick let me explain why is just an ok solution. The main reason is that this solution will only work if your data source supports an ORDER BY command, so mainly databases. As I explained before my main data source is SAP. When I designed the BI architecture I created a staging database between SAP and the BI database (the one that feeds directly the cubes). The idea was to minimize the time querying SAP and to have a common set of data to later build the fact and dimension tables. Well now I found a new good reason to have a staging database. The SAP adapter provided by Microsoft doesn’t support ORDER BY, so you need first to load the data in the staging database and from there to your BI database and this time using ORDER BY.
If you Google a bit you will find commercial SSIS transformations that sort records using temporary files on disk. I really like that solution but you need to pay for it, and we don’t really like to pay for stuff, do we?