Non-fix width flat file

Recently I had to get the data from a flat file into the data warehouse. The file is coming from a vwell known company that provides oil & gas product prices in the international markets. The file is fetched via FTP.

The format of the file is as follows:

As you can see this could be a fixed width type of file with the following columns and widths:

  • Column 0: Width 1: N
  • Column 1: Width 1: -blank space-
  • Column 2: Width 7: (material code) AAKBT00
  • Column 3: Width 1: (price type:  high, low, close) c
  • Column 4: Width 1: -blank space-
  • Column 5: Width 12: (datetime) 201105240000
  • Column 6: Width 1: -blank space-
  • Column 7: Width 6: (price) 120.48

Well after adding this configuration to the Flat File Connection Manager, I had the following result:

As you can see, only the first row is correct. The more that I tried to change the width value for the last column I couldn’t get the row values right.

This screenshot shows the result with Column 7 with a width value of 7 instead of 6:

The problem is that the file is not formatted properly, the last column doesn’t have a fixed width:

As the file was coming from an external FTP server I can’t influence the way the file is formatted at origin, so I need to find a solution once the file is fetched by the FTP task.

To solve this problem it was clear that I needed fixed width data rows from where I can extract the data. To have this width fixed data rows I configured the Flat File Connection Manager as follows:

This time I’m using the Delimited format type, with no text qualifier and a header row delimiter set to {CR}{LF}.

In the Advance section one column is declared with an OutputColumnWidth value of 50 and a ColumnDelimiter value of {CR}{LF}:

This configuration will return a fixed width column of 50 characters with one data source row contents in each row:

Now we can always assume that the last column width will be at least of 10 characters, so we can get the values out of it without jumping to the next row as it was happening before.

To extract the individual values from the rows and transform them into columns that we can use in SSIS, the Derived Column Transformation will help. With this transformation we can apply String functions, among others, to a data set column. Using the SUBSTRING function we can extract part of the contents of a row and create a derived column from it:

The last step will be to transform the new derived columns to the correct data type. For this I like to use a Script Transformation where you can use C# code for the data type transformation:

if (Row.DCPrice.Trim() != "") Row.DTCYDCPrice = System.Convert.ToDecimal(Row.DCPrice.Trim());
else Row.DTCYDCPrice = 0;
Row.DTDATEDCDate = new DateTime(System.Convert.ToInt32(Row.DCDate.Substring(0, 4)), System.Convert.ToInt32(Row.DCDate.Substring(4, 2)), System.Convert.ToInt32(Row.DCDate.Substring(6, 2)));

At the end you will have a data set with the original column, the derived columns in string data type and some additional columns with data in the correct data type format. Now you can feed the data to your database without a problem.

Advertisements
Tagged with: , , , , ,
Posted in SSIS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 24 other followers

Categories
%d bloggers like this: