Manipulating Heterogeneous Data with OPENQUERY , OPENROWSET , and Linked Servers

Manipulate heterogeneous data. Methods include linked servers, OPENQUERY , and OPENROWSET .

So far, all the data you've been able to access has been stored in databases on one SQL Server. This section talks about how to query data that's on other SQL Servers and in other systems without importing the data. Later sections in this chapter deal with how to import and export data with SQL Server.

You can use three different types of syntax to access data; each one serves a different purpose. First, you're going to see OPENROWSET , which is the most flexible but hardest to use of the three. Then OPENQUERY , which is easier to use but requires a bit more setup. Then you'll see how linked servers are set up and used.

Each of the three methods OPENROWSET , OPENQUERY , and linked serversrelies on having an operable OLE-DB driver for the system you're trying to talk to. SQL Server 2000 ships with OLE DB drivers for SQL Server, text files, Oracle, IBM's DB/2, and others. To understand how to use each of these methods, you need to understand how to talk to an OLE-DB provider.

A Brief Discussion on OLE-DB

OLE-DB is a low-level database access protocol that is designed to enable a client to access different systems without having to under stand how each different system works. The foundation of OLE-DB is the OLE-DB provider, which has to implement certain methods. At this level, an OLE-DB provider needs to know certain things to make a connection to the system.

In the case of another SQL Server, the OLE-DB provider needs to know the name of the server and how to log into the server. In the case of a text file, the OLE-DB provider needs to know where the file is and what format the file is in, as well as other properties about the file, such as delimiters, whether the first row contains column names , and so on. The point here is that different providers have different requirements, but that after you get past those requirements they all work the same way as far as retrieving data.

NOTE

Which OLE- DB Providers Are Installed? You can find a list of the OLE-DB providers that you have installed in the Registry at HKEY_LOCAL_MACHINE\Software\Microsoft\MSSQLServer\Providers\ SQLOLEDB .

The only way to understand how to use the provider is to consult the documentation for that provider and for that version of the provider. That's not a very good answerit's not an answer that is going to make your life easier. Unfortunately, it's the only answer you're going to get.

Using OPENROWSET

The OPENROWSET function returns a rowset to SQL Server. It's used in a SELECT statement in the same place as a table. Here's the syntax of the OPENROWSET function:

 OPENROWSET('provider_name', {'datasource';'userid'; 'password'} or 'provider_string',           {catalog.schema.object} or 'query')

The exact definitions of the different arguments are in the list that follows , but here's the general idea of how the syntax works. You have to use a provider_name . You can either use datasource , userid , and password , or you can use a provider_string . You can also specify a catalog.schema.object or use a query. Notice that the datasource , userid , and password are delimited by semicolons. The reason for this is simple: to give the people who write the MCP exams something to trick you with. Here are the parameters for the OPENROWSET function.

provider_name . This is the name of the provider, as specified in the Registry. It is not optional.
datasource . This is the name of the file, server, or whatever the OLE-DB provider needs to figure out what it should be talking to.
user_id . This is the user name that the provider understands. Some providers may not need a user name, so this could be blank.
password . This is the password that the provider understands. Some providers may not need a password, so this could be blank also.
provider_string . This is a free text field that has everything else that the provider needs to initiate the connection. Only the provider knows for sure what this is; you'll have to dig through piles of documentation.
catalog . This is the name of the database, catalog, or whatever the provider understands. It's the top level of the hierarchy of object names. It's also likely to be blank if the provider doesn't use it.
schema . This is the name of the owner of the object. It also may be blank if the provider doesn't use it.
object . This is the name of the object being manipulated. Believe it or not, this may be blank if there's only one object in the datasource, such as a text file.
query . This is a string that's provider-specific, and it's passed directly to the provider as a query. It may be a SQL query; it may be something else. It's also not processed by SQL Server, so you should make sure that the syntax is valid before you send it, otherwise you'll get a very incomprehensible error message.

If you are somehow missing the point, nearly every single argument in the entire OPENROWSET function is optional, and it's use or lack thereof depends on the provider you are using. Experience dictates that the documentation and examples for how to use the particular provider in which you are interested will be either non-existent or inaccurate. In other words, good luck.

Actually, OPENROWSET is a very useful function; it's just complicated and difficult to set up. After you have it figured out for a particular provider, it works very well. You should plan on spending several hours with a new OLE-DB provider to figure out how it works. Here's an example of OPENROWSET in use:

 SELECT * FROM openrowset('sqloledb',        'SQLTest1';'sa';'',        'SELECT * FROM master.dbo.sysobjects' )

This example is a fairly simple case; it just returns a rowset from another SQL Serverin this case a SQL Server running on a box named SQLTest1, which has a blank SA password. You can tell it's a request to another SQL Server because 'sqloledb' is the provider, and that's the name of the provider used to talk to other SQL Servers. It runs a simple SELECT statement and returns a simple rowset. Notice that it uses a full three-part name for the table. This isn't required, but it is good practice for doing remote queries. You can then use the results just as you would any other table: You can join them to other local tables, filter them, or whatever you like:

 SELECT SQLTest1sysobjects.name FROM openrowset('sqloledb',        'SQLTest1';'sa';'',        'SELECT * FROM sysobjects' ) SqlTest1sysobjects INNER JOIN sysobjects        ON sysobjects.name = SQLTest1sysobjects.name WHERE SQLTest1sysobjects.type = 'u'

Notice that this uses a table alias to reference the OPENROWSET return values; it filters on the object type from the remote table and does a join. So it really does work as a table does. But the syntax is incredibly cumbersome. You're probably hoping there's some shortcut you can use to avoid all the complexity. Well, read ahead to see how that can happen.

Creating a Linked Server

You can think of a linked server as a prebuilt set of arguments for OPENROWSET stored in a database object. You can just create an object called a linked server, and create it with all the attributes that you'd normally use in a call to OPENROWSET . Then, rather than having to type all those parameters in over and over again, you can just use the linked server.

There are two ways to create a linked server: Using the Query Analyzer and using Enterprise Manager. Which one is easier? Well, the explanation for using Query Analyzer is going to take a paragraph, and the explanation for using Enterprise Manager is going to use an involved Step by Step. You be the judge.

In Query Analyzer, you create the linked server using the sp_addlinkedserver system stored procedure:

  sp_addlinkedserver [ @server = ]  'server'  [ , [ @srvproduct = ]  'product_name' ]  [ , [ @provider = ]  'provider_name' ]  [ , [ @datasrc = ]  'data_source' ]  [ , [ @location = ]  'location' ]  [ , [ @provstr = ]  'provider_string' ]  [ , [ @catalog = ]  'catalog' ]

Does that look familiar? The only parameters that aren't part of the OPENROWSET you've already learned about are server and provider ; the rest are just bad names on the same parameters. Server is the name of the linked server you are creating. Any SQL Server-approved object name works; it doesn't necessarily have to be the name of the server, but it probably should be. The srvproduct argument is the name of the OLE-DB data source. The provider argument specifies the same thing as the srvproduct , but does it with an OLE-DB PROGID, which is a long identifier. The rest of the options are from OPENROWSET . If you're linking to another Microsoft SQL Server, you can specify a @srvproduct of SQL Server only, and the @server argument must be the server's name. None of the other arguments are required or allowed. Here's a linked server definition for the server you saw used in the previous section:

 sp_addlinkedserver @Server = 'OpenRowsetTest',        @SrvProduct = 'sqloledb',        @Provider = 'sqloledb',        @DataSrc = 'SQLTest1'

Thenand this is the really great partyou can query it like this:

 SELECT * FROM openrowsettest.master.dbo.sysobjects

That's a four-part name. They are required when using linked servers. You can't shortcut the object owner part either; you have to fully specify the entire thing. You could, of course, alias it, join it, or whatever you want to do; it's going to act just like a normal table.

Step by Step 9.1 takes you through one command operation done in full color in Enterprise Manager.

STEP BY STEP

9.1 Creating a Linked Server in Enterprise Manager

Open SQL Server Enterprise Manager. If you don't have the server you want to work with registered yet, you should do that now.
Expand the server you want to work with, then expand the Security container, and then click on the Linked Server container.
In the right side of the window, right-click in an empty spot and choose New Linked Server from the context menu. The Linked Server Properties - New Linked Server window appears, as seen in Figure 5.1.

Figure 5.1. Window used to define a linked server.
There are a few boxes to fill in here. Fill in the Linked Server box with the name of the linked server you want to use. If the linked server is a SQL Server, you need to use the real name of the server in the Linked Server check box.
If it is a SQL Server, choose the SQL Server radio button. This disables the rest of the input on the screen, and you can continue on with the next step. Otherwise, choose the Provider Name from the drop-down box, then fill in the product name, data source, and provider name as appropriate for the provider you are using. If you're using SQL Server, you have to put in the server's real name.
On the Security tab, shown in Figure 5.2, you can choose how users will log in to the linked server. A discussion on this tab follows these instructions.

Figure 5.2. Window used to set the security options for the linked server.
The Server Options tab, shown in Figure 5.3, includes various options you can use to change the status of the connection. The Collation Compatible option specifies whether the remote server uses the same collation as the local server. The next three options, Data Access, RPC, and RPC Out, define how the connection can be used. A Data Access connection allows tables to be queried, whereas an RPC connection allows only stored procedures to be called. The RPC Out option enables the server to call remote stored procedures on the local machine. The Use Remote Collation options specifies that the linked server's collation order is to be used to run the query. Collation Name is the name of the collation to be used if the server is not collation compatible or if the Use Remote Collation option is off. Connection Timeout is the length of time a query should wait for the server to respond to a connection request, and a Query Timeout is the length of time a query should wait for a response.

Figure 5.3. Other options for the linked server.
Click OK to create the linked server.

The Security tab, mentioned in Step 6 in the Step by Step, needs a little more elaboration on a couple of fronts. First of all, in Figure 5.2 you'll notice that there are two different parts to the window. In the top part you can specify local logins and map them to logins on the remote server. In the bottom part you can specify what happens to local logins who aren't in the list. You can specify that users who are not on the list will not be allowed to use the remote server by choosing the Not Be Made option. You can choose that the users not on the list be forced into "guest" status, if available, on the remote server by choosing the Be Made Without Using a Security Context option. You can choose to attempt to pass through the authentication that the user has on the local server by choosing the Be Made Using the Login's Current Security Context item, or you can force everyone who isn't on the list in the top part of the window to use a specific login and password with the Be Made Using This Security Context option.

Now that you know how to create linked servers, it's time to look at using them in a little more detail.

Using Linked Servers

There are two ways to use an established linked server. The firstand easiest to understandmethod is to use the four-part name as shown in the Step By Step 9.1. As you remember, a three-part name is database.owner.object , so for the sysobjects table in the Master database, you can reference it as master.dbo.sysobjects . For a linked server, you can use a four-part name, which is server. database.owner.object , or, in the example above, sqltest1.master.dbo.sysobjects .

The four-part name works great when you're using other SQL Servers as the remotes. What happens when you want to use something other than SQL Server on the other side? Then you need to use the OPENQUERY function. OPENQUERY looks suspiciously similar to OPENROWSET , but it takes only two arguments: the linked server name and a query. So you could run this:

 SELECT * FROM openrowsettest.master.dbo.sysobjects

or you could run this:

 SELECT * from OPENQUERY(OpenRowsetTest, 'SELECT * FROM master.dbo.sysobjects')

Notice that the linked server name does not have quotation marks around it. SQL Server doesn't like quotation marks there. Both these queries return the same resultset.

These linked servers are really great, but don't get any ideas about using them to import data. Although it works well for smaller datasets (less than a few hundred thousand rows), there are significantly more efficient ways to handle data imports, which is what is covered in a few sections. Next up, however, is a discussion on the cutting edge of Internet technology: Using XML.

Manipulating Heterogeneous Data with OPENQUERY , OPENROWSET , and Linked Servers