Implementing Informatica's PowerCenter to Load Data in SAP BW
As stated previously, SAP has certified data warehouse products from several data warehouse vendors to load data in SAP BW using Staging BAPIs. I implemented solutions using products from Acta Technology, Hummingbird, and Informatica Corporation that extracted data directly from an Oracle (Siebel) database and a Financial Analysis data mart in Microsoft SQL Server to populate InfoCubes in SAP BW. Each product that I worked with has its own strengths and limitations.
In this chapter, I use PowerCenter from Informatica Corporation to demonstrate Staging BAPI implementation. This demonstration is not to be considered an endorsement or recommendation for Informatica products or other vendors. Appendix A, "Data Warehouse Industry References and SAP BW Training," provides information on products from Acta Technology, Hummingbird, and Informatica.
Informatica Corporation is a leading data warehouse construction tool vendor. PowerCenter 1.5 is an enterprise data integration hub that enables large organizations to transform legacy, relational, and ERP data into reliable information for strategic business analysis. The Integration Server for SAP BW uses SAP BW Staging BAPIs to load data in SAP BW. Though Informatica provides several components to support a wide range of data sources and targets to construct a data warehouse (see details in Appendix A), the following section describes only how to use PowerCenter 1.5 Integration Server to load data in SAP BW 1.2A and 1.2B.
Though you can install PowerCenter Designer, Repository, and Integration Server on UNIX or Windows NT servers, for this example, I've installed all PowerCenter components on one Microsoft Windows NT 4.0 server. The repository was based on Microsoft SQL Server 6.5. This NT server had 256 MB RAM, 266 MHz dual Intel processor, and SAP GUI 4.5 full-development environment.
Figure 14-4 shows components of Informatica's Integration Server environment for SAP BW.
Figure 14-4: Components of PowerCenter Integration Server for SAP BW.
Implementing data loads in SAP BW is a four-step process, as listed here:
Create third-party data source specific definitions for Informatica PowerCenter in SAP BW.
Define RFC connection in SAP BW for Informatica PowerCenter and create and define data extraction and load scheme.
Connect the PowerCenter Integration Server to SAP BW.
Load data in SAP BW.
Create Data Source Definitions for PowerCenter in SAP BW
To create a new data source definition in SAP BW, select the Source systems tab in the Administrator Workbench (see Figure 14-5). Then right-click Source systems and select Create Source System.
Figure 14-5: Defining Informatica PowerCenter as Data Source Provider in SAP BW.
From the Select Source System Type option, select Ext.System, data and metadata transfer-staging. Then enter the source system and its description, as shown in Figure 14-6. In this example, a source system named PC_SOURCE is used to get data from PowerCenter. After entering source information, click the Check button.
Figure 14-6: Creating the Source System in SAP BW.
Defining an RFC Connection in SAP BW for Informatica PowerCenter
The next step in completing the external data source definition is to set up an actual RFC connection definition. SAP provides several connection types to access remote programs; but for SAP BW Staging BAPIs implementations of Informatica PowerCenter, use connection type T and start an external program via TCP/IP.
When you click the check mark button, shown in Figure 14-6, the system opens an RFC destination definition window, transaction SM59. Enter a textual description for this connection definition. Enter Program ID as PC_SRC. SAP BW uses this program ID to communicate with the PowerCenter Integration Server for SAP BW. The RFC destination definition process is the same in both BW 1.2B and BW 2.0A.
Notice that the program ID is case sensitive, and the value you enter must be the same value you enter in the SAPRFC.INI file used by the PowerCenter Integration Server.
As of SAP R/3 release 3.0D, an RFC server program, for example, PowerCenter Integration Server for SAP BW, can be registered with the SAP gateway and wait for incoming RFC requests.
After entering the program ID PC_SRC, click the Registration button, shown in Figure 14-7. Registering means telling the SAP RFC gateway that an external program, PC_SRC, will communicate with SAP BW.
Figure 14-7: Establishing the RFC Definition for the PowerCenter Environment for SAP BW.
You can actually verify this remote connection by clicking Test connection, as shown on the top left of Figure 14-7. However, at this stage it will give you a connection error because the PowerCenter Integration Server for SAP BW has not been started. I discuss this process in the next section.
This test connection option is a good way of resolving any connection or networking-related problems. Results of the Test connection option show how long SAP BW takes to communicate with the external program, in this case, PowerCenter Integration Server. If connection time is less than 20 milliseconds (see Figure 14-8), your data transfer will be acceptable. Otherwise, check with the networking team to reconfigure the remote server so it has fewer hops to speed up the network level connection between the SAP BW server and the PowerCenter Integration Server.
Figure 14-8: Testing the Connection between SAP BW and the PowerCenter Integration Server for SAP BW. Connection Time is in Milliseconds.
After defining this data source in SAP BW, the rest of the process using a data source is the same as if data were coming from SAP R/3 OLTP. For testing purposes, create an InfoSource for transaction and master data, assign PC_SOURCE as a data source, and define transfer and communication structures. Only the transfer structure for Active InfoSources will be visible in PowerCenter as one data target per InfoSource.
Connecting the PowerCenter Integration Server to SAP BW
To establish a connection from the PowerCenter Integration Server to SAP BW, you need to define the SAP BW source definition in the SAPRFC.INI file. This file usually exists in the Windows NT system directory, C:\WINNT\SYSTEM32. The SAPRFC.INI file has the following parameters per data source:
DEST: A Destination identifier (or logical name) for a data source/target that will be used in the PowerCenter repository to connect to a data source/target.
TYPE: Identifies the connection type. Here set it to R, meaning an RFC connection. Details on connection types and an example of SAPRFC.INI are provided in Appendix B, "SAP BW and SAP R/3 Transactions, Tables, and Code Examples."
PROG_ID: The program ID; here you use PC_SRC as the program ID.
ASHOST: Host name or IP address of the SAP BW application server.
SYSNR: SAP R/3 (BW) system number.
If you need to connect to multiple SAP R/3 or SAP BW data sources, you need to define additional destinations in the SAPRFC.INI file. For example, the SAPRFC.INI file to connect the SAP BW instance has the following parameters:
/*=========================================*/ /* SAPRFC.INI file for Integration Server */ /* The IP addresses are dummy addresses */ /*=========================================*/ /* Type R: Register an RFC server program at an SAP gateway */ /* or connect to an already registered RFC server program */ /* The PowerCenter Integration Server for BW uses */ /* this information */ */ /*============================*/ DEST=BWSERVER TYPE=R PROGID=PC_SRC GWHOST=188.8.131.52 GWSERV=sapgw00
After developing SAP BW loading schemes in PowerCenter, you must make sure that such interfaces work against test BW instances before going live in production. This is done by simply changing the IP address of the target SAP BW instance in SAPRFC.INI that points to the target BW instance (development, test, and production). Never change the DEST name in the SAPRFC.INI file after defining a data extraction scheme in PowerCenter for a specific InfoSource. The PowerCenter repository stores the DEST name in its repository and not the IP addresses. The DEST name in SAPRFC.INI is similar to the logical data source name in SAP BW.
Importing the SAP BW InfoSource in PowerCenter
Before importing data in SAP BW using InfoSource TESTPC, you need to define the data sources and data target in the PowerCenter Designer to define the mapping and transformation needed before loading in SAP BW.
In this example, a flat file is used as the data source that contained product royalty data. Using the PowerCenter Warehouse Designer tool, define the Royalty file structure and import metadata of activated InfoSources, as shown in Figure 14-9.
Figure 14-9: Importing InfoSource Metadata From SAP BW into a PowerCenter Repository as the Target Data Source.
When you select Import From SAP BW, you are asked to enter SAP BW logon information. The connect string is actually the destination value of SAP BW, as defined in the SAPRFC.INI file-in this case, BWSERVER, as shown in Figure 14-9. After successful logon to SAP BW, you see a list of transfer structures of active InfoSources. Now you simply drag and drop transfer-structure definitions into the Targets folder of the ROYALTY project. The dotted line shows the drag-and-drop path. Repeat the same process to import additional InfoSources, if any, and save the project.
After defining data sources and targets in PowerCenter, the next step is to define mapping and transformations. It is here you build field-level relationships between data sources and targets. To do so, click the Mappings folder in the project tree and create a new mapping scheme; for example, MappingtoBW. Then drag and drop sources and targets in the mapping work area. PowerCenter automatically relates fields between sources and targets using the field-ordering scheme. If this is not what you need, you can build your own field mapping by simply linking a source field to a target field of your choice, as shown in Figure 14-10. Behind each link, you can define several methods to cleanse, compute, or convert data for your target source.
Figure 14-10: Defining Source to Target Data Mapping and Transformations in PowerCenter.
It is important to note that PowerCenter will do needed data type conversions for the target data sources. For more information on how PowerCenter handles and converts SAP BW data types, look in the Informatica documents on the CD-ROM accompanying this book.
Defining an SAP BW Connection in the PowerCenter Server Manager
Due to the distributed deployment architecture of Informatica products, individual components were designed to have their own repositories. However, when all such components share one repository, you still need to redefine the database connection scheme for individual components. In the future, individual component management logic may change so that when an entity has been defined in the repository, it is automatically available to other components.
The previous section defined the SAP BW connection for the PowerCenter Warehouse Designer. Now you need to redefine the same SAP BW connection for the PowerCenter Server Manager. It is here that you define ETL jobs, called sessions. These sessions are called from the SAP BW Scheduler to do data extraction/transformation and to ship the transformed data to SAP BW.
From within the PowerCenter Server Manager menu, select the Connect to Repository option and choose the Server Configuration-Database Connection option. Then add SAP BW as target database BWSERVER, add the DEST parameter value in SAPRFC.INI as a data source, and save the definition. Now PowerCenter Server Manager knows all the registered data sources and targets. However, to define the data extraction scheme, you need to associate a session to a specific data transformation or mapping scheme. This is done in the PowerCenter Server Manager.
First you connect to a repository from the Server Manager main menu and choose Operations-Add session. There you enter the mapping name, for example MappingtoBW. Save this session information and remember the session name; for example, s_S260File_Mapping. You will need this session name to schedule this job from SAP BW. All defined sessions are listed in the Organizer window, as shown in Figure 14-11.
Figure 14-11: PowerCenter Server Manager, Used to Define Sessions and to Monitor and Control Jobs.
Here the Session Wizard asks a few more questions on how and where to execute this session. Enter the server name on which you want to run this job, as well as the target data source. For SAP BW, select the data target type as ERP and select the data processing mode as INSERT rows. This completes the session definition.
Loading Data in SAP BW
Scheduling and loading data is done from within an SAP BW instance. PowerCenter Server Manager is capable of starting data extraction jobs, but when the data target is an SAP BW instance, you cannot start sessions from the PowerCenter Server Manager. You launch such sessions only from within SAP BW via the PowerCenter Integration server for SAP BW. You can, however, stop or monitor active jobs from the PowerCenter Server Manager when they are in progress.
Loading data in SAP BW is a two-step process, as listed here:
Start PowerCenter Integration Server for SAP BW.
Schedule data load jobs from SAP BW Scheduler.
Start PowerCenter Integration Server for SAP BW
This integration server is a Staging BAPI Listener. It is started on a Windows NT 4.0 server. SAP BW Scheduler sends a job startup request to the PowerCenter Server Manager for a specific saved mapping. Figure 14-13 shows how SAP BW Scheduler and PowerCenter Integration Server exchange BAPI calls to process external data.
To start the PowerCenter Integration Server, you need to know the SAP BW destination and the PowerCenter Repository user name and password. Figure 14-12 shows the command line interface to start the PowerCenter Integration Server on Windows NT 4.0. BWSERVER is the SAP BW destination defined in SAPRFC.INI; pmrepo is the repository name and password followed by the TCP/IP port number for this listener. Once started, PowerCenter Integration Server waits to receive requests from SAP BW.
Figure 14-12: Starting the PowerCenter Integration Server for BW.
Schedule Data Load Jobs From SAP BW Scheduler
To schedule a data load import, first create an InfoPackage request against the InfoSource. Then schedule the InfoPackage.
The only difference in the SAP BW Scheduler definition is that you will see a new tabstrip called 3rd Party selections, as shown in Figure 14-13. This shows only when the InfoSource is based on Staging BAPI implementation of third-party ETL tools.
Figure 14-13: Scheduling Data Import Jobs From SAP BW.
Select the 3rd Party selections tabstrip. Enter the session name you have defined in the PowerCenter Server Manager for specific data mapping and transformations. Remember that you created a session called s_S260File_Mapping in an earlier session. Enter this session name in the value field, as shown in Figure 14-13.
Now schedule this job in SAP BW. The Scheduler connects to the PowerCenter Integration Server and hands over the session name. The PowerCenter Integration Server then starts the session, s_S260File_Mapping, on the server as defined in the PowerCenter Server Manager. After data transformations, PowerCenter sends data to SAP BW. The data is processed in SAP BW as methods defined in the transfer rules or as defined data targets-ODS, InfoCube, or both-during job scheduling in SAP BW.
While this job is in progress, you can monitor the data transformation status from the PowerCenter Server Manager, as shown in Figure 14-12. To monitor data load activity in SAP BW, use a typical InfoPackage Scheduler Monitor to see the number of data load activities.
Other Third-Party Staging BAPI Implementations
In general, all SAP BW-certified vendors work the same-that is, they support the Staging BAPI Listener; SAP BW triggers data extraction jobs, and the tools send data back to SAP BW and to an InfoSource. However, they do differ based on algorithms to process data using proprietary technologies.
Each product has strengths and weaknesses. For example, ActaWorks from Acta Technology and Genio from Hummingbird provide robust integration with SAP BW. The Genio engine provides a fast data load technology, and ActaWorks seems to be very easy to implement and support in production and has an excellent data extraction environment for SAP R/3. Information on other products that load data in SAP BW is available on the accompanying CD-ROM.