In this entry I would like to share the steps that I do when configuring SSIS to fetch data from SAP.
First step: Which data to fetch?
That should be the very first question of all. Which data should I get from SAP? What are the table and column names? Which filter to apply? Etc. I will not cover here the process of business requirement gathering here as this will be a completely different post. Let’s assume that the requirements are clear, that we know what report to generate and the good reasons behind and we will focus instead in the technical part of the process.
Usually table names come from my SAP expert workmates. They are the ones developing ABAP code, configuring and extending SAP and doing the full maintenance. When presented with the business requirements they will come back to me with the tables to query and a, sometimes incomplete, column names list. It is the responsibility of the BI developer/analyst to decide which columns and filters to apply to meet the business requirements. Let’s use a real example to better understand the process:
Recently I had the request to create a simple report that shows “Parked Documents”. SAP aware readers will understand what a parked document is, for the rest let’s assume that it is a special kind of document. Let’s assume that the report is just as simple as to get the information out of SAP and be accessible from an Excel spreadsheet/pivot table.
The Parked Documents information in SAP resides in a table called VBKPF, which contains 78 columns. As normally in SAP we don’t need all the columns but just a few of them. To know what each column is enter in SAP, in my case I use the SAP Logon application and enter the command se16 to access the data browser:
Press Enter and you will enter the VBKPF table selection screen:
To view which columns this table has and what are they meaning go to Settings -> Fields for selection. The following new window will appear:
As you can see each table column has the description next to it. By time you will get used to the names and long list of columns available and would be easier for you to navigate through them.
In this window you could already decide the majority of columns to query, but there is something else that you can do. You can query the table directly from SAP and investigate the columns that have been configured by default to be used in that report. For that close this last window and go back to the previous screen, the VBKPF table selection screen and click the first icon below the “Data Browser: Table VBKPF: Selection Screen” banner, the one with a clock and a green check mark. This icon executes the default report for the VBKPF table and returns the first 500 rows:
If you would like to know the column name of any of the columns returned by this report just right click on the column header and select the option Choose from the pop-up menu. For example if we would like to know the column name of the ‘Document Type’ report column it will be BLART:
By now we should know which fields we should get from SAP. Next steps will be to import those fields to SQL Server.
Second step: Import data into SQL Server
I’m guessing that normally now we should now configure SSIS to fetch the data from SAP into SQL Server, but I like to do an intermediate step that helps me to avoid pitfalls and to go a bit faster. I like to use the Import data task in SQL Server Management Studio to fetch by the first time the SAP data into SQL Server. The reason behind is that by doing this SQL Server will create for us the table in SQL Server with the correct data types. This is faster than us creating that table. Here are the steps to do:
Open SQL Server Management Studio, right click on your destination database and select Import Data… option in the Tasks submenu.
This will open the SQL Server Import and Export Wizard window. Choose the SAP data provider as the data source and fill up the connection information; normally ‘Application Server Host’, ‘System Number’, ‘Client’, ‘Password’ and ‘User name’ fields:
Click Next button until you reach the window to enter the SQL statement query (there is no data browser available for the SAP data provider). Here you will enter a normal T-SQL(ish) query:
This query is written in a trimmed down version of the T-SQL language. There are a lot of functionality lacking (for example you cannot do a SELECT DISTINCT). For more information you can check the SELECT syntax for the SAP data provider here: http://msdn.microsoft.com/en-us/library/cc185153(v=bts.10).aspx
Click Next to go to the Tables and views wizard window. In this window I like to change the name of the Destination table to the same as in SAP, in our example VBKPF.
Click on Edit Mappings… button to change the columns data types that the wizard will create automatically for you. I like to change some of the types that comes by default, specially the date and time types. For example, the BUDAT column(Posting Date) will appear as a smalldatetime type by default, but SAP will not record time information in this field, but only the date. I like to change then the type to date instead of smalldatetime for all the “date” columns:
For the “Time” columns (like 10:42:35) the wizard identifies them as int, due to the SAP Data Provider conversion. We should change them to time type to properly map the values (for example in the CPUTM column).
Finally we can click Next until we reach the end of the wizard and execute the query. I like to execute the query to early detect any data type conversion error.
Third step: Automate the data import in SSIS
To make this post shorter and clearer we will assume that each time that we import the data from SAP we get all the available data and not only the delta and the updates. In some cases this scenario is valid as the data contained in the queried SAP table is small and changes frequently (several times per day). In those cases I like to configure my SSIS packages as follows:
Inside a Sequence Container add an Execute SQL Task and a Data Flow Task. The Execute SQL Task will just drop and create the destination table in SQL Server, in our example the VBKPF. By doing this we will delete all previous records. We could have used DELETE or TRUNCATE, but I like the DROP and CREATE because it simplifies deployments between environments as you ensure that the format of the table to use. To get the DROP and CREATE script just right click in the VBKPF table create by the wizard before in SQL Server Management Studio, and select Script Table as -> DROP And CREATE To -> Clipboard option:
This will add the script in your clipboard, now you can paste it into the SQLStatement field in the Execute SQL Task in SSIS. Important: Remember to delete the lines with the USE command (highlighted below). The Execute SQL Task has the Connection field where you specify the connection manager to use.
/****** Object: Table [dbo].[VBKPF] Script Date: 06/07/2011 10:56:13 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[VBKPF]') AND type in (N'U'))
DROP TABLE [dbo].[VBKPF]
/****** Object: Table [dbo].[VBKPF] Script Date: 06/07/2011 10:56:13 ******/
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
CREATE TABLE [dbo].[VBKPF](
[BELNR] [nvarchar](10) NOT NULL,
[BKTXT] [nvarchar](25) NOT NULL,
[BSTAT] [nvarchar](1) NOT NULL,
[BLART] [nvarchar](2) NOT NULL,
[BLDAT] [date] NOT NULL,
[BUDAT] [date] NOT NULL,
[CPUDT] [date] NOT NULL,
[CPUTM] [time](0) NOT NULL,
[GJAHR] [nvarchar](4) NOT NULL,
[MONAT] [int] NOT NULL,
[USNAM] [nvarchar](12) NOT NULL,
[USCMP] [nvarchar](12) NOT NULL,
[USREL] [nvarchar](12) NOT NULL,
[USUPD] [nvarchar](12) NOT NULL,
[UPDDT] [date] NOT NULL,
[AEDAT] [date] NOT NULL,
[TCODE] [nvarchar](20) NOT NULL
) ON [PRIMARY]
The Data Flow Task contains one OLE DB Source and one OLE DB Destination component:
In the OLE DB Source component just copy and paste the query used in the Import Data task in SQL Server Management Studio:
In the OLE DB Destination component choose the VBKPF table in SQL Server created by the Import Data task in SQL Server Management Studio (how convenient!) and configure the following properties:
- Data access mode: Table or view –fast load
- Rows per batch: 10,000
- Maximum insert commit size: 10,000
Rows per batch (RPB) and Maximum insert commit size (MICS) properties allow us to limit how many rows will get inserted in SQL Server at once. It is important to configure this number to avoid inserting 10 million rows in a single batch/transaction, which will require vast amount of memory, long table lock, etc… that will indeed impact the performance of the server. Personally I like to set it up to 10,000 rows which is the default process buffer size in SSIS tasks.
Remember that if you leave these two properties with their default value, all returned rows from the query will be added to a single batch. For the more technical readers, if MICS is set to 5,000 and you have 9,000 rows and an error is encountered in the first 5,000 results, the entire batch of 5,000 will be rolled back. MISC equates to the BATCHSIZE argument in the BULK INSERT T-SQL command. RPB is merely a hint to the query optimizer. The value of this should be set to the actual expected number of rows. RPB equates to the ROWS_PER_BATCH argument to the BULK INSERT in T-SQL (http://msdn2.microsoft.com/en-us/library/ms188365.aspx).
After all these steps and configurations we have created a robust SSIS package that imports data from SAP in a very easy and clear way 🙂