InfoPath Document


This example demonstrates the ability of SSIS to interact with an external Microsoft Office application, namely InfoPath 2003.

Note

Microsoft InfoPath 2003 is a desktop forms client that provides a rich interface to XML-based documents. For more information about Microsoft InfoPath 2003, see Professional InfoPath 2003 by Ian Williams and Pierre Greborio (Wiley, 2004).

Using a document created from the Timecard template supplied with InfoPath 2003, you will import portions of data stored in an InfoPath document and output the results to a comma-delimited flat file.

Portions of imported data in this demonstration appear disconnected in the Data Flow task. This example demonstrates an SSIS method to join disconnected data. This example also covers some troubleshooting.

Note

Many thanks to Wenyang Hu for permission to reuse some elegant XSL!

Create a new Integration Services project in SSIS. Drag an XML task onto the Control Flow. Double-click the XML task to open the editor. Configure the XML task as follows:

  • Operation Type: XSLT.

  • Source Type: File Connection.

  • Source: New File Connection. Configure the New File Connection as follows:

    • Usage Type: Existing File.

    • File: Click Browse to locate and select an InfoPath timecard directory and file. The timecard file may be generated using the InfoPath 2003 Timecard template, or you can get the Timecard_ARay.xml file from the Resources.

  • Save Operation Result: True.

  • Overwrite Destination: True.

  • Destination Type: File Connection.

  • Destination: New File Connection. Configure the New File Connection as follows:

    • Usage Type: Create File.

    • File: Use the same InfoPath directory containing the Timecard files. File name: TimecardResult.xml.

  • Second Operand Type: File Connection.

  • Second Operand: New File Connection. Configure the New File Connection as follows:

    • Usage Type: Existing File.

    • File: SSISInfoPath.xsl (Wenyang Hu's XSL file — see Resources).

Note

Why transform the Timecard XML file? Because the SSIS XML task does not support multiple namespaces.

Click OK to proceed as shown in Figure 17-10.

image from book
Figure 17-10

Now take a brief look at some XML task properties before moving on.

The top property of the task, OperationType, defines the remaining properties. The XML Task Editor changes to present different properties for different OperationTypes. There are six OperationTypes:

  • Diff: Creates a Diffgram (an XML document consisting of the differences between two XML documents) XML document from the differences between the XML defined in the Source property and the XML defined in the SecondOperand property.

  • Merge: Adds XML defined in the SecondOperand property to the XML defined in the Source property.

  • Patch: Adds a Diffgram defined in the SecondOperand property to the XML defined in the Source property.

  • Validate: Validates the XML defined in the Source property by the XML Schema Definition (XSD) or Document Type Definition (DTD) defined in the SecondOperand property.

  • XPath: Specifies an XPath query in the SecondOperand property executed against the XML defined in the Source property to evaluate or aggregate or to return a node or value list. For more information about using XPath, see XPath 2.0 Programmer's Reference by Michael Kay (Wiley, 2004).

  • XSLT: Applies XML Stylesheet Language (XSL) documents defined in the SecondOperand property to the XML defined in the Source property. For more information about XSLT, see XSLT 2.0 Programmer's Reference, 3rd Edition, by Michael Kay (Wiley, 2004).

The OperationResult property defines the output of the XML task. The DestinationType property can be set to File Connection or Variable, requiring a corresponding Connection Manager or Variable, respectively, to be assigned to the Destination property.

To generate the TimecardResult.xml file, you must execute this task.

Note

This is also a good development practice: Create a task and then test it before moving on. You may find that you cannot accomplish what you wish with this type of task, and this discovery may impact downstream development decisions.

Right-click the XML task and click Execute Task. You may receive a validation error in the Errors window — especially on the first execution of the task. If everything is configured properly, however, the task will succeed and the TimecardResult.xml file will be created in the Timecards directory.

Drag a Data Flow task onto the Control Flow. Connect the XML task to the Data Flow task using the available Precedence Constraint (the green arrow on the XML task). Double-click the Data Flow task to proceed. Drag an XML Source onto the Data Flow and double-click it to edit. Browse to the location of TimecardResult.xml — generated in a previous step — to configure the XML Location parameter. Click the Generate XSD button to automatically generate a schema definition for the file (this is such a timesaver!) as shown in Figure 17-11. Click OK to proceed.

image from book
Figure 17-11

Note

If you receive the error "Unable to infer the XSD from the XML file. The XML contains multiple namespaces" while stepping through this example, make sure you are using the SSISInfoPath.xsl file supplied in Resources. If you are adapting this example, make sure your transformation eliminates multiple namespaces from your source XML.

Drag a merge join onto the Data Flow. Merge joins, discussed in Chapter 4, are designed to join rows of data from disparate sources. This example uses them to join disconnected data from the same source: the same XML file. The desired result is one row from the file containing the employee name and information about the work week.

Note

This merge could be accomplished many other ways; this is an example of the flexibility of SSIS.

In order to join the disconnected data, the merge join needs a field upon which to join. To create this field, drag two Derived Column transformations onto the Data Flow. Connect the XML Source to one of the Derived Column transformations. Select timecard_employee_name from the Output drop-down list on the Input Output Selection dialog box. Connect the XML Source to the other Derived Column transformation, and select Week as the Derived Column input. Double-click each Derived Column in turn to open their respective editors. Configure the same Derived Column for each as follows (as shown in Figure 17-12):

  • Derived Column: <add as new column>

  • Derived Column Name: JoinID

  • Expression: 1

  • Data Type: 4-byte signed integer [DT_I4]

image from book
Figure 17-12

Click OK to proceed. You can now connect the outputs of the Derived Column transformations to the Merge Join transformation, except for one thing: The merge join requires the input data to be sorted. So drag and drop two Sort transformations onto the Data Flow. Connect the output of each Derived Column transformation to a respective Sort transformation. Double-click each Sort transformation to configure it. Select the JoinID for each Sort, allowing all other columns to pass through the transformation, as shown in Figure 17-13.

image from book
Figure 17-13

Connect the outputs of each Sort to the Merge Join transformation. Assign the output of the first Sort transformation to the merge join. When prompted, select Merge Join Left Input as the input for the first Sort output — the second will connect by default to the remaining available input. Double-click the Merge Join transformation to edit it. Make sure that Inner Join is selected in the Join Type drop-down list and that the Join Key checkbox for each JoinID field is checked in the Available Columns tables. Check the Select checkbox for the JoinID columns in the join, as shown in Figure 17-14. Click OK to close the editor.

image from book
Figure 17-14

Add a Flat File Destination to the Data Flow and connect the merge join to it. Double-click the Flat File Destination to configure it. Click the new Flat File Connector to configure a new file destination named TimeCardOutput. Select Delimited as the Flat File Type. The Flat File Connection will be created, and the Flat File Connection Manager Editor will display. Click Browse to choose a File Name and enter TimecardOutput.csv. Click OK to return to the Flat File Destination Editor. Click the Mappings item to generate the column mappings as shown in Figure 17-15, and then click OK to close the editor.

image from book
Figure 17-15

Right-click each Sort output, respectively, and click Data Viewers. Make sure Data Viewers is selected in the left pane of the Data Flow Path Editor, and then click the Add button. Accept the default name of the Data Viewer and make sure Grid is selected on the General tab of the Configure Data Viewer wizard. Click OK to add a Data Viewer to the Sort output. Add another Grid Data Viewer to the output of the merge join. Test the Data Flow by clicking the Play button and observing the results. In this instance, the error shown in Figure 17-16 is detected during package validation.

image from book
Figure 17-16

To troubleshoot this problem, open the Data Flow and double-click the XML Source transformation. An Editing Component dialog box (shown in Figure 17-17) displays a message that the component is not in a valid state.

image from book
Figure 17-17

This dialog box references the same field identified in the validation error and then asks, "Do you want the component to fix these errors automatically?" Clicking the Yes button resolves this issue but creates two more. Red X's display downstream in each Sort transformation, but these are easily resolved by editing and resetting them to the desired state. The error next propagates to the Merge Join transformation, but again it is addressed by resetting to the desired state previously defined. After these items are addressed, the package validates and runs when the Play button is clicked, displaying the data in Data Viewers (an example is shown in Figure 17-18) as it does so.

image from book
Figure 17-18

This example demonstrated some techniques for importing and filtering a subset of data from an XML document. You used an InfoPath-generated XML document as the source, but the approach to the solution is valid for loading any XML document into SSIS.



Professional SQL Server 2005 Integration Services
Wireless Java : Developing with Java 2, Micro Edition
ISBN: 189311550X
EAN: 2147483647
Year: 2006
Pages: 182

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net