Hack 91. Export and Import Table Definitions

You want to move your data from one vendor's database platform to another. The first stage is to extract the metadata.

It should be possible for a system to export an entire database as a sequence of SQL commands. It also should be possible to import it again, on the same machine or on another machine, running the same database platform or a different one.

That's the theory. In reality, each database vendor has its own variation on SQL, which makes moving a database between platforms difficult, but certainly not impossible. The first step is to extract the table definitions and the relationships between tables. After that you can move the data itself.

Unsurprisingly, the vendors have invested in making it easy for you to import your database, but have not put as much investment into exporting. So, each platform is better at accepting ANSI standard SQL than it is at exporting that format. And, of course, each platform can import the SQL that it has exported (assuming you're importing into the same version you exported from), so this causes a problem only for moving between different vendors or versions.

11.2.1. MySQL

MySQL has facilities for exporting table definitions. The mysqldump command-line utility can display CREATE TABLE commands as well as the data itself. You can find the documentation at http://dev.mysql.com/doc/refman/5.0/ en/mysqldump.html. For exporting just the schema you can use --no-data, and if you were interested only in the data you can use --no-create-info. The output from mysqldump will need a little coaxing if you plan to import those CREATE statements into another system. The SQL that comes out of mysqldump with the default options is not going to be acceptable to any other system:

andrew@SQLZoo3:~> mysqldump --no-data u username -ppassword dbname staff
-- MySQL dump 10.10
--
-- Host: localhost Database: dbname
-- ------------------------------------------------------
-- Server version 5.0.18-standard

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
--SNIP--
--
-- Table structure for table QstaffQ
--

DROP TABLE IF EXISTS QstaffQ;
CREATE TABLE QstaffQ (
 QidQ varchar(20) character set utf8 NOT NULL,
 QnmQ varchar(200) character set utf8 default NULL,
 PRIMARY KEY (QidQ)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

--SNIP--
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;

First, a number of comments appear at the start and end of the file. Your target system can safely ignore these, because /* */ delimiters indicates a comment in most SQL implementations. At worst, a target database platform will generate error messages without stopping the process. The back quotes and the ENGINE instructions will stop the other platforms from continuing, and the character set utf8 may also have to be filtered out.

A host of switches can improve the output, and by piping the output through sed you can also remove the character set information. You still get all the comments as before, but the output is much cleaner now:

andrew@SQLZoo3:~> mysqldump u username -ppassword --skip-opt 
> --compatible=ansi --no-data --skip-quote-names 
> dbname staff | sed 's/character set utf8 //'
--SNIP--
CREATE TABLE staff (
 id varchar(20) NOT NULL,
 nm varchar(200) default NULL,
 PRIMARY KEY (id)
);

--SNIP--

--skip-quote-names removes the ability to support using a reserved word as a column name. If you have used a reserved word as a column name, you may have to rename the column in order to make it possible to create the new table.

 

11.2.2. PostgreSQL

PostgreSQL has a simple export command, pg_dump, which produces relatively normal SQL. It does produce commented lines starting with --, blank lines, and unwanted SET commands, but these are easy to remove with a sed command. PostgreSQL uses the term character varying for VARCHAR; you can fix that with another sed substitution:

$ pg_dump --table=staff --schema-only --no-owner scott |
> sed -e '/^--/d' 
> -e '/^SET/d' 
> -e '/^ *$/d' 
> -e 's/character varying/VARCHAR/'
CREATE TABLE staff (
 id VARCHAR(20) NOT NULL,
 nm VARCHAR(200),
 n integer
);
ALTER TABLE ONLY staff
 ADD CONSTRAINT staff_pkey PRIMARY KEY (id);

 

11.2.3. SQL Server

In SQL Server, you can generate a CREATE script from the Query Analyzer utility (replaced by Server Management Studio in SQL Server 2005). You can do this for a table by right-clicking on it from the Object Browser, or you can do a whole database at once by selecting the Tasks menu item from the Database right-click menu. You should set a few options to make your SQL more portable; set Include If NOT EXISTS and Script Owner to false.

Here is a typical generated script:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [staff](
 [id] [nvarchar](20) NOT NULL,
 [nm] [nvarchar](200) NULL,
PRIMARY KEY CLUSTERED 
(
 [id] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

GO

You have to deal with a fair amount of SQL Serverspecific code, and you are going to have to hack the output before any other system will deal with it.

You can search and replace the GO instructions with a semicolon. Most systems generally accept a semicolon as a delimiter.

You can usually leave instructions such as SET ANSI_NULLS ON in the file. Most database system command interfaces support a mechanism for merely generating an error message on unrecognized commands, but continuing anyway onto the next instruction. So, for instance, if you were importing this into MySQL:

 This fails with an error message 
$ mysql -u username  -p password  dbname  < file.sql 

 This prints an error messages, but succeeds anyway 
$ mysql -u username  -p password  dbname 
mysql> source file.sql 

it would be easy to remove the open and close square brackets (SQL Server 2000 had an option to suppress these, but 2005 has taken away that feature).

You can fix most of the details using search and replace on any old text editor. You could instead use a few lines of Perl in the file convertMSSQL.pl:

use strict;
while (<>){ #Foreach line of the file
 s/^GO$/;/; #Replace GO with semicolon
 s/[datetime]/TIMESTAMP/I; #Replace [datetime] with TIMESTAMP
 s/[//g; #Delete [
 s/]//g; #Delete ]
 s/).*/)/; #Delete all characters following )
 print; #Print out the new version of each line
}

You can run this script from the operating system prompt:

perl convertMSSQL.pl < mssqloutput.sql

 

11.2.3.1. Dates in SQL Server

You must be especially careful when importing or exporting dates from SQL Server. SQL Server uses the DATETIME data type; other systems call this TIMESTAMP. To confuse matters further SQL Server has a TIMESTAMP type, but that does something else entirely.

When exporting from SQL Server, change DATETIME to TIMESTAMP.

When importing, change each of DATE, TIME, and TIMESTAMP to DATETIME.

11.2.4. Oracle

You need the exp and imp utilities to get the SQL CREATE statements out of Oracle. The exp command creates a binary file called expdat.dmp that contains all of your data and metadata. There is no option to allow exp to create SQL directly. The command imp is used to reimport the .dmp file. You can run the imp command so that it doesn't actually import anything, but instead prints out the SQL commands that it would have used:

[gordon@db book]$ exp USERNAME/password 'TABLES= ( DBBOSS )'

Export: Release 10.1.0.3.0 - Production on Mon Jul 17 12:10:41 2006

Copyright (c) 1982, 2004, Oracle. All rights reserved.


Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
Export done in UTF8 character set and AL16UTF16 NCHAR character set
server uses AL32UTF8 character set (possible charset conversion)

About to export specified tables via Conventional Path ...
. . exporting table DBBOSS 30 rows exported
EXP-00091: Exporting questionable statistics.
EXP-00091: Exporting questionable statistics.
Export terminated successfully with warnings.
[gordon@db book]$ imp USERNAME/password SHOW=Y
Import: Release 10.1.0.3.0 - Production on Mon Jul 17 12:10:46 2006

Copyright (c) 1982, 2004, Oracle. All rights reserved.


Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

Export file created by EXPORT:V10.01.00 via conventional path
import done in UTF8 character set and AL16UTF16 NCHAR character set
import server uses AL32UTF8 character set (possible charset conversion)
. importing USERNAME's objects into USERNAME
 "CREATE TABLE "DBBOSS" ("USERNAME" VARCHAR2(100), "MODID" NUMBER) PCTFREE 1"
 "0 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE(INITIAL 65536 FREELISTS 1 FREE"
 "LIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING NOCOMPRESS"
. . skipping table "DBBOSS"

 "ALTER TABLE "DBBOSS" ADD PRIMARY KEY ("USERNAME", "MODID") USING INDEX PCT"
 "FREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 65536 FREELISTS 1 FREELIST "
 "GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING ENABLE"
 "ALTER TABLE "DBBOSS" ADD FOREIGN KEY ("USERNAME") REFERENCES "DBUSERS" ("US"
 "ERNAME") ENABLE"
 "ALTER TABLE "DBBOSS" ADD FOREIGN KEY ("MODID") REFERENCES "DBPROG" ("MID") "
 "ENABLE"
Import terminated successfully without warnings.
[gordon@db book]$

The SQL that you can capture and reuse on another system is highlighted in the preceding code. However, the imp command has unhelpfully broken up lines and added double quotes. A little processing would take care of this, but it is actually easier to dive into the binary file expdat.dmp. This file has the SQL DDL (Data Definition Languagethe CREATE and ALTER commands) commands buried in it, and you can find them using the Unix command strings:

[andrew@db book]$ strings expdat.dmp
gEXPORT:V10.01.00
UDBRW
RTABLES
8192
 Mon Jul 17 12:10:41 2006expdat.dmp
#C##
#C##
+00:00
BYTE
UNUSED
INTERPRETED
DISABLE:ALL
METRICST
TABLE "DBBOSS"
CREATE TABLE "DBBOSS" ("USERNAME" VARCHAR2(100), "MODID" NUMBER) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE(INITIAL 65536 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT) TABLESPACE "USERS" LOGGING NOCOMPRESS
INSERT INTO "DBBOSS" ("USERNAME", "MODID") VALUES (:1, :2)
-- snip --
ALTER TABLE "DBBOSS" ADD FOREIGN KEY ("USERNAME") REFERENCES "DBUSERS" ("USERNAME") ENABLE
ENDTABLE
TABLE "DBBOSS"
ALTER TABLE "DBBOSS" ADD FOREIGN KEY ("MODID") REFERENCES "DBPROG" ("MID") ENABLE

Once you've got a cleaned-up version you've still got to remove the Oracle-specific code. There are simple issues, such as replacing VARCHAR2 with VARCHAR, but there are also Oracle-specific additions to the end of the statement, and these can be harder to process automatically.

11.2.5. Access

Microsoft Access is handy for moving data around. It has a relatively simple wizard for importing a single table from a flat file; it is also fairly easy and intuitive to set up an ODBC connection. See "Tunnel into MySQL from Microsoft Access" [Hack #44] for an example.

Getting metadata such as schemas out of Access is a little harder. The Visual Basic code shown here will output a basic table definition into the debug window; you can copy and paste from there. The script has several limitations, however. It assumes that each table has only one index, the primary key. Also, it requires that the primary key be a single field. You should be able to customize this if you need to:

Private Sub Command0_Click( )
Dim sql As String
For i = 0 To CurrentDb.TableDefs.Count - 1
 If Left(CurrentDb.TableDefs(i).Name, 4) <> "MSys" And _
 Left(CurrentDb.TableDefs(i).Name, 4) <> "~TMP" Then
 Debug.Print "CREATE TABLE " & CurrentDb.TableDefs(i).Name & "("
 For j = 0 To CurrentDb.TableDefs(i).Fields.Count - 1
 Debug.Print " " & CurrentDb.TableDefs(i).Fields(j).Name;
 Debug.Print " " & _
 dataType(CurrentDb.TableDefs(i).Fields(j).Type, _
 CurrentDb.TableDefs(i).Fields(j).Size);
 If CurrentDb.TableDefs(i).Fields(j).Required Then _
 Debug.Print " NOT NULL ";
 Debug.Print " , "
 Next
 Debug.Print " PRIMARY KEY(" & _
 CurrentDb.TableDefs(i).Indexes(0).Fields(0).Name & ")"
 Debug.Print ");"
 Else
 Debug.Print "--Skipping table: " & CurrentDb.TableDefs(i).Name
 End If
Next i
DoCmd.RunCommand acCmdDebugWindow
End Sub

Private Function dataType(t As Integer, sz As Integer)
If t = 10 Then dataType = "VARCHAR(" & sz & ")": Exit Function
If t = 20 Then dataType = "DECIMAL(" & sz & ")": Exit Function
If t = 3 Then dataType = "INTEGER": Exit Function
If t = 7 Then dataType = "FLOAT": Exit Function
dataType = "UNKNOWN_DATA_TYPE_" & t & " "
End Function

Here is some sample output from the preceding Visual Basic script:

--Skipping table: ~TMPCLP295561
CREATE TABLE bbcRemote(
 name VARCHAR(50) NOT NULL , 
 region VARCHAR(60) , 
 area DECIMAL(16) , 
 population DECIMAL(16) , 
 gdp DECIMAL(16) , 
 PRIMARY KEY(region)
);
--Skipping table: MSysAccessObjects
--Skipping table: MSysACEs
--Skipping table: MSysObjects
--Skipping table: MSysQueries
--Skipping table: MSysRelationships
CREATE TABLE t(
 a UNKNOWN_DATA_TYPE_4 NOT NULL , 
 b VARCHAR(50) , 
 PRIMARY KEY(a)
);
CREATE TABLE Table1(
 test VARCHAR(50) , 
 PRIMARY KEY(test)
);

As it happens, you can't put this output back into Access because the data types have nonstandard names. You should use LONG in place of DECIMAL(16).

All of the properties of the database are available through this Visual Basic interface. For example, you can access all of the foreign keys through the CurrentDb.Relations collection. You can iterate over that using the same techniques as shown for the CurrentDb.TableDefs and the CurrentDb.TableDefs( i ).Indexes collections.

11.2.6. Potential Showstoppers

Some factors can make the process of switching platforms harder.

11.2.6.1. Auto-numbers

If you've used autonumbering schemes you will have a problem. You can convert the metadata without too much difficultly (see Table 11-1 for details). However, getting the data across will be challenging. You need to suspend the generation of new numbers during import, but then you have to switch it back on again when your new system is up and running.

"Generate Unique Sequential Numbers" [Hack #57] and "Generate Sequential or Missing Data" [Hack #82] may be of use in transporting autogenerated sequences.

Table 11-1. Auto-numbering column types for database systems

System Column type Notes
SQL Server
IDENTITY
 
MySQL INTEGER AUTO_INCREMENT  
PostgreSQL
INTEGER
Use NEXTVAL('seqName') to calculate the value to insert.
Oracle
INTEGER
Use seqName.NEXTVAL to calculate the value to insert.
Access
COUNTER
 

 

11.2.6.2. Spaces in table names and column names

If you have spaces in your table names or field names you will have to use the appropriate quoting mechanism. SQL Server and Access use square brackets; MySQL allows back quotes; and Oracle will let you use double quotes. You could instead convert the names to use _ rather than a space, which will solve the problem for future exports and imports.

11.2.6.3. Nonstandard functions

It is difficult to write complex queries while sticking to the standard, and in many cases it is just not possible. You will probably find that much of your SQL has to be rewritten. Table 11-2 and Table 11-3 show some of the most commonly used functions.

Table 11-2. Compatibility of common functions (PostgreSQL, MySQL)

ANSI PostgreSQL MySQL
COALESCE(x, y) ANSI ANSI
CASE WHEN b THEN t ELSE f END ANSI ANSI andIF (b, t, f)
EXTRACT(MONTH FROM d) ANSI andDATE_PART('month', d) ANSI andMONTH( d )
whn + INTERVAL '5' DAY whn + INTERVAL '5 DAY'andwhn + 5 ANSI andwhn + 5
SUBSTRING(s FROM x FOR y) ANSI ANSI andSUBSTRING (s, x, y)
a||b||c
ANSI
CONCAT(a,b,c)

Table 11-3. Compatibility of common functions (Oracle, SQL Server, Access)

ANSI Oracle SQL Server Access
COALESCE(x, y) ANSI andNLV(x, y) ANSI Nz(x, y)
CASE WHEN b THEN tELSE f END ANSI ANSI IIF(b, t, f)
EXtrACT(MONTH FROM d) ANSI and MONTH( d ) DatePart( mm, d) Month( d )
whn + INTERVAL '5' DAY ANSI andwhn + 5 DateAdd( d , whn, 5)andwhn + 5 DateAdd( d, whn, 5)andwhn + 5
SUBSTRING(s FROM x FOR y) SUBSTR(s, x, y) Substring(s, x, y) MID(s, x, y)
a||b||c
ANSI a + b + c a & b & c


SQL Fundamentals

Joins, Unions, and Views

Text Handling

Date Handling

Number Crunching

Online Applications

Organizing Data

Storing Small Amounts of Data

Locking and Performance

Reporting

Users and Administration

Wider Access

Index



SQL Hacks
SQL Hacks
ISBN: 0596527993
EAN: 2147483647
Year: 2004
Pages: 147

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net