The XML Bulk Load Component

As you've just seen, updategrams can be used to insert data in a SQL Server database. You might recall that you can also use the OpenXML function to add data to a table. Although both of these approaches are ideal for business processes that involve adding one, or a few, records, neither of them is ideally suited to importing a large volume of XML data.

To assist you in bulk load operations, the XML For SQL Server 2000 Web Release includes the XML bulk load component. This COM component can be used from any COM-aware programming language (such as Microsoft VBScript) to import XML data into SQL Server.

How the Bulk Load Component Works

To use the XML bulk load component, you define a mapping schema that matches the XML data you want to import with the tables and columns you want to load it into. The component then reads the XML data as a stream, inserting each row into the database as the corresponding XML element is read.

For example, consider the following annotated schema:

 <?xml version="1.0" ?> <Schema xmlns="urn:schemas-microsoft-com:xml-data"      xmlns:dt="urn:schemas-microsoft-com:datatypes"       xmlns:sql="urn:schemas-microsoft-com:xml-sql" >      <ElementType name="Catalog" sql:is-constant="1">         <element type="Category"/>     </ElementType>     <ElementType name="Category" sql:relation="Categories"> 
         <AttributeType name="CategoryID"/>         <AttributeType name="CategoryName"/>         <AttributeType name="Description"/>         <attribute type="CategoryID" sql:field="CategoryID"/>         <attribute type="CategoryName" sql:field="CategoryName"/>         <attribute type="Description" sql:field="Description"/>         <element type="Product">             <sql:relationship  key-relation="Categories"                 key="CategoryID"                 foreign-key="CategoryID"                 foreign-relation="Products"/>         </element>     </ElementType>        <ElementType name="Product" sql:relation="Products">         <AttributeType name="ProductID"/>         <AttributeType name="ProductName"/>         <AttributeType name="SupplierID"/>         <AttributeType name="QuantityPerUnit"/>         <AttributeType name="UnitPrice"/>         <attribute type="ProductID" sql:field="ProductID"/>         <attribute type="ProductName" sql:field="ProductName"/>         <attribute type="SupplierID" sql:field="SupplierID"/>         <attribute type="QuantityPerUnit"              sql:field="QuantityPerUnit"/>         <attribute type="UnitPrice" sql:field="UnitPrice"/>     </ElementType> </Schema> 

This schema maps each Category element in an XML document to a row in the Categories table, and each Product element to a row in the Products table. This schema could be used to import the following XML data into the Northwind database:

 <?xml version="1.0"?> <Catalog>     <Category Category          CategoryName="Scottish Foods"          Description="Traditional food from Scotland">         <Product Product              ProductName="Porridge"              Supplier              QuantityPerUnit="10 boxes x 20 bags"              UnitPrice="16"/>         <Product Product              ProductName="Haggis"              Supplier              QuantityPerUnit="12 Boxes"              UnitPrice="19"/>     </Category>     <Category Category          CategoryName="Scottish Drinks"          Description="Traditional drinks from Scotland">         <Product Product              ProductName="Single Malt Whisky"              Supplier              QuantityPerUnit="12 - 550 ml bottles"              UnitPrice="100"/>      </Category> </Catalog> 

When the bulk load component reads the XML data file, it uses the schema to determine the scope of each row to be inserted. In the case of the preceding catalog data, the bulk load component would read the first <Category> tag and begin constructing a new row for the Categories table. When the closing tag (</Category>) is read, the row will be written to the database.

Another interesting aspect of the way the bulk load component maps XML data to rows in the database is the way in which it handles nested relationships. In the catalog example, the schema declares that each Product subelement of a Category element maps to a row in the Products table. You define the relationship between the two tables by using the relationship annotation to join the tables on the CategoryID field. When the bulk load component reads a <Product> tag, it begins constructing a new row for the Products table. Notice, however, that the CategoryID field for the Products table isn't included as an attribute of the Product element. In cases like this, the bulk load component uses the relationship annotation to retrieve the appropriate value from the parent element (in this case, Category) and inserts that value into the new row. For this tactic to work, the key field for the parent element must be declared in the schema before the subelement representing the related table because the bulk load component reads the data as a stream, and if the key field hasn't been read by the time the closing tag for the subelement is read, the insert can't take place.

If the subelement contains an explicit key field value, however, use the value in the subelement instead of the key field in the parent. The CategoryID value could be repeated in the Product subelement, for example, in which case the order of the CategoryID attribute declaration and the Product subelement in the schema wouldn't matter.

Bulk Loading XML Data

If you want to bulk load XML data, you can use a SQLXMLBulkLoad object. You create an instance of this object programmatically by using the ProgID SQLXMLBulkLoad.SQLXMLBulkLoad. You then need to connect the component to the database either by setting the ConnectionString property of the SQLXMLBulkLoad object to a valid OLE-DB connection string or by setting the ConnectionCommand property of this object to an existing ADO Command object. Once connected, you can use the bulk load component to import XML data into the database by calling the Execute method of this object, which requires the path to the schema file and the XML data to be imported.

For example, the following code is the minimum required to import the data in the C:\CatalogData.xml file into the Northwind database using the C:\CatalogSchema.xml schema:

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.Execute c:\CatalogSchema.xml, c:\CatalogData.xml Set objBulkLoad = Nothing 

The bulk load component also allows you to specify a stream rather than a file as the XML data to be imported, which makes it easy to retrieve data as an XML stream from one database and import it into another.

Bulk Loading from XML Fragments

You can import data from an XML fragment (an XML document with no root element) by specifying the XMLFragment property, as shown here:

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.XMLFragment = True objBulkLoad.Execute c:\CatalogSchema.xml, c:\CatalogFragment.xml Set objBulkLoad = Nothing 

Maintaining Referential Integrity

When you import data into a database, it's important to maintain referential integrity by ensuring that no duplicate rows exist and that relationship constraints between tables are enforced. The bulk load component gives you a number of options to ensure that data can be imported without compromising the integrity of the database.

Enforcing Constraints

Constraints are rules that data in the database must obey. You can use constraints to ensure that values in a particular column meet a specified criterion or to enforce relationships between tables. You can use the CheckConstraints property of the SQLXMLBulkLoad object to specify whether the bulk load component should check constraints. If this component imports a large volume of data, checking constraints for each row inserted can impair performance. For this reason, the bulk load component doesn't check constraints by default when bulk loading XML data.

To force the bulk load component to check constraints, you can set the CheckConstraints property to True, as shown in this code sample:

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.CheckConstraints = True objBulkLoad.Execute c:\CatalogSchema.xml, c:\CatalogData.xml Set objBulkLoad = Nothing 

If you attempt to insert a row that breaks a constraint when the CheckConstraints property is set to True, the bulk load component will return an error.

Ignoring Duplicate Keys

The foundation of any good database design is that there should be no duplicate key values in a table. By default, the bulk load component issues an error message if you attempt to insert a duplicate key, and the bulk load process halts. In some circumstances, however, you don't want your application to behave this way. For example, suppose that after performing the previous bulk load of Scottish produce, you need to update the catalog by inserting the new products in the following XML document:

 <?xml version="1.0"?> <Catalog>     <Category Category          CategoryName="Scottish Foods"          Description="Traditional food from Scotland">         <Product Product              ProductName="Smoked Salmon"              Supplier              QuantityPerUnit="5 Crates"              UnitPrice="16"/>         <Product Product              ProductName="Fresh Lobster"              Supplier              QuantityPerUnit="12 Boxes"              UnitPrice="19"/>     </Category>     <Category Category          CategoryName="Scottish Drinks"          Description="Traditional drinks from Scotland">         <Product Product              ProductName="Blended Whisky"              Supplier              QuantityPerUnit="12 - 550 ml bottles"              UnitPrice="100"/>      </Category> </Catalog> 

This document contains three new products (smoked salmon, fresh lobster, and blended whisky), but the categories (Scottish Foods and Scottish Drinks) already exist in the database. If we attempt to load this data, the operation will fail at the first hurdle because the database already has a category with a CategoryID of 99.

The solution to this problem is to set the IgnoreDuplicateKeys property of the SQLXMLBulkLoad object to True, as shown in this code:

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.IgnoreDuplicateKeys = True objBulkLoad.Execute c:\CatalogSchema.xml, c:\CatalogData.xml Set objBulkLoad = Nothing 

When you set the IgnoreDuplicateKeys property to True, any inserts that would cause a duplicate key value will fail, but the bulk load process doesn't stop, which allows you to import the remaining data.

Inserting IDENTITY Column Values

As I explained earlier, IDENTITY columns are often used to automatically generate a unique key for each row in a table. In the Northwind database, the ProductID column in the Products table and the CategoryID column in the Categories table are both IDENTITY columns.

By default, if an XML document contains a value for an IDENTITY column, the value in the XML document will be used instead of an automatic value generated by SQL Server. You can change this behavior by setting the KeepIdentity property of the SQLXMLBulkLoad object to False, in which case the bulk load operation will ignore the value in the XML document and SQL Server will generate a new IDENTITY value for each row. The following code shows how to set the KeepIdentity property to False:

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.KeepIdentity = False objBulkLoad.Execute c:\CatalogSchema.xml, c:\CatalogData.xml Set objBulkLoad = Nothing 

Be careful when you use the KeepIdentity property because it's an all-or-nothing choice that can sometimes lead to unexpected results. For example, you might want to insert some new catalog data and assign an IDENTITY value to each new product using the following code:

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.CheckConstraints = True objBulkLoad.IgnoreDuplicateKeys = True objBulkLoad.KeepIdentity = False objBulkLoad.Execute c:\CatalogSchema.xml, c:\CatalogData.xml Set objBulkLoad = Nothing 

This code will create a new IDENTITY value for the products, but it will also create a new row for each Category element in the Categories table, with an IDENTITY column for each row. This means there are now two rows in the Categories table for Scottish Foods and for Scottish Drinks.

Inserting NULLs

Maybe you have a table that includes some columns that are not mapped in the schema. By default, the bulk load component inserts the column's default value for any column not mapped. If the unmapped column doesn't have a default value, the bulk load component inserts a NULL in that column. And if the unmapped column doesn't allow NULLs, you'll get an error message.

You can force the bulk load component to insert a NULL value for unmapped columns even when a default value is defined. To do this, simply set the KeepNulls property of the SQLXMLBulkLoad object to True, as shown in this code:

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.KeepNulls = True objBulkLoad.Execute c:\CatalogSchema.xml, c:\CatalogData.xml Set objBulkLoad = Nothing 

Forcing a Table Lock

When bulk loading XML data, the bulk load component locks the table receiving the data for each individual insert. You can force the bulk load component to lock a table for the entire duration of the bulk load operation by setting the ForceTableLock property of the SQLXMLBulkLoad object to True.

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.ForceTableLock = True objBulkLoad.Execute c:\CatalogSchema.xml, c:\CatalogData.xml Set objBulkLoad = Nothing 

Using Transactions

You can force a bulk load operation to be transactional (in which case the whole import either succeeds or fails) by setting the Transaction property of the SQLXMLBulkLoad object to True. This setting makes the overall bulk load process slower by caching all inserts in a temporary file until the entire XML document has been read, but using this setting can be a very important way to maintain the integrity of your database in the case of a bulk load error.

You must not set the Transaction property to True when you're bulk loading binary data such as images.

You can control the location of the temporary file that the bulk load component creates for a transacted bulk load by setting the TempFilePath property of the SQLXMLBulkLoad object. If you're importing data to a SQL Server database on the local computer, this setting can take the form of an ordinary file path, such as c:\tempdata. However, in most circumstances, you won't actually be sitting at the SQL Server machine when you run the bulk load process, so you should use a Universal Naming Convention (UNC) network path. UNC paths take the form \\ServerName\ShareName. To use a share named Data on a server called DBServer1, for example, you would set the TempFilePath property to \\DBServer1\Data.

The following code shows how a transaction can be used in a bulk load operation:

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.Transaction = True objBulkLoad.TempFilePath = "\\DBServer1\Data" objBulkLoad.Execute c:\CatalogSchema.xml, c:\CatalogData.xml Set objBulkLoad = Nothing 

In the preceding sample code, the bulk load component operates in its own transaction context. In the case of an error, the bulk load component issues a ROLLBACK command to abort the transaction. When you're using the ConnectionCommand property to piggy back on an existing OLE-DB connection, the transaction is controlled by the client application, which must explicitly commit or abort the transaction. The following code shows how to use the ADO Connection object's RollbackTrans and CommitTrans methods to control a transaction:

 On Error Resume Next Err.Clear Set objCmd = CreateObject("ADODB.Command") objCmd.ActiveConnection= _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.Transaction = True objBulkLoad.ConnectionCommand = objCmd objBulkLoad.Transaction = True objBulkLoad.Execute c:\CatalogSchema.xml, c:\CatalogData.xml If Err.Number = 0 Then     objCmd.ActiveConnection.CommitTrans     Set objBulkLoad = Nothing Else     objCmd.ActiveConnection.RollbackTrans End If 

The Transaction property must be set to True when you're using the ConnectionCommand property to connect to the database.

Logging Errors

You can log error messages from the bulk load component to a file by specifying the ErrorLogFile property of the SQLXMLBulkLoad object, as shown here:

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.ErrorLogFile = "c:\BulkLoadErrors.xml" objBulkLoad.Execute c:\CatalogSchema.xml, c:\CatalogData.xml Set objBulkLoad = Nothing 

If an error occurs during the bulk load operation, the error message returned will be written to the file specified as an XML document that resembles the following example:

 <?xml version="1.0"?> <Error>     <Record>         <HResult>0x80040E2F</HResult>         <SQLState>01000</SQLState>         <NativeError></NativeError>         <ErrorState>1</ErrorState>         <Severity>0</Severity>         <Source>Microsoft OLE DB Provider for SQL Server</Source>         <Description>             <![CDATA[The statement has been terminated.]]>         </Description>     </Record>     <Record>         <HResult>0x80040E2F</HResult>         <SQLState>23000</SQLState>         <NativeError></NativeError>         <ErrorState>1</ErrorState>         <Severity>14</Severity>         <Source>Microsoft OLE DB Provider for SQL Server</Source>         <Description>             <![CDATA[Violation of PRIMARY KEY constraint                  'PK_TestCategories'. Cannot insert duplicate key in                  object 'TestCategories'.]]>         </Description>     </Record> </Error> 

The file contains a Record element for each error that occurred, with the most recent error at the beginning of the document. These elements can be useful for tracing problems during insert operations. The ImportCatalogData.vbs script in the Demos\Chapter8 folder on the companion CD can be used to import data into the Categories and Products tables in the Northwind database.

Generating the Database Schema

Sometimes you might want to perform a bulk load that creates the tables for the new data as part of the import process. You can achieve this by setting the SchemaGen property of the SQLXMLBulkLoad object to True, as shown here:

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.SchemaGen = True objBulkLoad.Execute c:\GenCatalogSchema.xml, c:\CatalogData.xml Set objBulkLoad = Nothing 

Executing this script causes the tables mapped in the schema to be created if they don't already exist. (If they do already exist, the data is simply inserted into the existing tables.) The table definitions are based on the information in the mapping schema, in which the table and column names, together with the data types, can be declared. For example, you could use the following mapping schema to create a new SpecialProducts table:

 <?xml version="1.0" ?> <Schema xmlns="urn:schemas-microsoft-com:xml-data"      xmlns:dt="urn:schemas-microsoft-com:datatypes"       xmlns:sql="urn:schemas-microsoft-com:xml-sql">      <ElementType name="Catalog" sql:is-constant="1">         <element type="Category"/>     </ElementType>     <ElementType name="Category" sql:relation="NewCategories">         <AttributeType name="CategoryID" sql:datatype="int"/>         <AttributeType name="CategoryName"/>         <AttributeType name="Description"/>         <attribute type="CategoryID" sql:field="CategoryID"/>         <attribute type="CategoryName" sql:field="CategoryName"/>         <attribute type="Description" sql:field="Description"/>         <element type="Product">             <sql:relationship key-relation="NewCategories"                 key="CategoryID"                 foreign-key="CategoryID"                 foreign-relation="SpecialProducts"/>         </element>     </ElementType>        <ElementType name="Product" sql:relation="SpecialProducts">         <AttributeType name="ProductID" sql:datatype="int"/>         <AttributeType name="ProductName" sql:datatype="nvarchar(40)"/>         <AttributeType name="SupplierID" sql:datatype="int"/>         <AttributeType name="QuantityPerUnit"              sql:datatype="nvarchar(40)"/>         <AttributeType name="UnitPrice" sql:datatype="money"/>         <attribute type="ProductID" sql:field="ProductID"/>         <attribute type="ProductName" sql:field="ProductName"/>         <attribute type="SupplierID" sql:field="SupplierID"/>         <attribute type="QuantityPerUnit" sql:field="QuantityPerUnit"/>         <attribute type="UnitPrice" sql:field="UnitPrice"/>     </ElementType> </Schema> 

When you use this schema to import data with the SchemaGen property set to True, a new table will be created, based on the following Transact-SQL statement:

 CREATE TABLE SpecialProducts  (     CategoryID int NULL ,     ProductID int NULL ,     ProductName nvarchar (40),     SupplierID int NULL ,     QuantityPerUnit nvarchar (40),     UnitPrice money NULL ) 

By default, the data in the specified XML data source will be imported into the new tables. You can choose to override this behavior by setting the BulkLoad property of the SQLXMLBulkLoad object to False, as shown here:

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.SchemaGen = True objBulkLoad.BulkLoad = False objBulkLoad.Execute c:\GenCatalogSchema.xml Set objBulkLoad = Nothing 

Notice that when the BulkLoad property is set to False, you can omit the data source parameter to the Execute method. This code simply creates the tables defined in the schema if the tables don't already exist; no data will be imported.

Specifying a Primary Key

The tables created using the SchemaGen property use the information in the mapping schema to define the columns. In particular, the sql:datatype annotation is used to define the column data types. You can also get the bulk load component to define a primary key column by including an id attribute in the schema and setting the SGUseID property of the SQLXMLBulkLoad object to True. For example, you could change the Product element declaration in the catalog schema to the following:

 <ElementType name="Product" sql:relation="SpecialProducts">     <AttributeType name="ProductID" sql:datatype="int" dt:type="id"/>     <AttributeType name="ProductName" sql:datatype="nvarchar(40)"/>     <AttributeType name="SupplierID" sql:datatype="int"/>     <AttributeType name="QuantityPerUnit" sql:dataype="nvarchar(40)"/>     <AttributeType name="UnitPrice" sql:datatype="money"/>     <attribute type="ProductID" sql:field="ProductID"/>     <attribute type="ProductName" sql:field="ProductName"/>     <attribute type="SupplierID" sql:field="SupplierID"/>     <attribute type="QuantityPerUnit" sql:field="QuantityPerUnit"/>     <attribute type="UnitPrice" sql:field="UnitPrice"/> </ElementType> 

Then you could perform the bulk load with the SchemaGen and SGUseID properties set appropriately, as you see here:

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.SchemaGen = True objBulkLoad.SGUseID = True objBulkLoad.Execute c:\GenCatalogSchema.xml, c:\CatalogData.xml Set objBulkLoad = Nothing 

The combination of the id attribute in the schema and the SGUseID property in the code would cause the following table to be created:

 CREATE TABLE SpecialProducts  (     CategoryID int NULL ,     ProductID int NOT NULL ,     ProductName nvarchar (40) NULL,     SupplierID int NULL ,     QuantityPerUnit nvarchar (40) NULL ,     UnitPrice money NULL ,     PRIMARY KEY  CLUSTERED (ProductID) ) 

Dropping Existing Tables

Sometimes you'll want to completely refresh the data in the database by dropping the existing tables, re-creating them, and importing the new data. For example, you might want to perform a bulk load of a catalog that not only includes new products, but also includes updated prices for existing products. The easiest way to update the entire catalog is to delete the existing records and replace them with the new data.

You can instruct the bulk load component to drop the tables mapped in the mapping schema by setting the SGDropTables property of the SQLXMLBulkLoad object to True, as shown in this code:

 Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad") objBulkLoad.ConnectionString = _     "provider=SQLOLEDB;data source=DBServer1;database=Northwind;" & _     "Integrated Security=SSPI;" objBulkLoad.SchemaGen = True objBulkLoad.SGDropTables = True objBulkLoad.Execute c:\GenCatalogSchema.xml, c:\CatalogData.xml Set objBulkLoad = Nothing 

Executing the preceding code drops the tables referenced in the schema and then re-creates them and imports the data from the CatalogData.xml file. So you should set the SchemaGen property to True when using the SGDropTables property. The ImportGenSchema.vbs script in the Demos\Chapter8 folder on the companion CD can be used to create new tables in the Northwind database.



Programming Microsoft SQL Server 2000 With Xml
Programming Microsoft SQL Server(TM) 2000 with XML (Pro-Developer)
ISBN: 0735613699
EAN: 2147483647
Year: 2005
Pages: 89

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net