Data Manipulation Language (DML) | Beginning Transact-SQL With SQL Server 2000 and 2005

The basic statements of DML are introduced in this chapter with elaboration to follow in later chapters.

You can do only four things with data. You can Create records, Read them, Update record values, and you can Delete records. That spells CRUD. . . we do CRUD with data. When SQL was devised, they chose to use different words for these four operations: Insert, Select, Update, and Delete. Somehow, ISUD isn't quite as easy to remember as CRUD. If you can master these four types of statements, you will be able to do just about anything with data using SQL. Here's the catch: Inserts, Updates, and Deletes are a piece of cake. On the surface, the Select command can also appear simple.

Queries Have Layers

In the movie Shrek, Mike Myers' character Shrek the Ogre explains to his friend Donkey that "Ogres are like onions — they have layers." To some degree, the SELECT statement is like an Ogre, or rather it's like an onion — it has layers. On the surface, there isn't that much to it. However, when you start peeling back the layers, there's quite a lot to it. You've likely discovered this fact on your own. Here's the important point: it's not complicated. It's really just layers upon simple layers. The fundamentals are quite simple. I call this principle compounded simplicity.

Before I can effectively introduce the next topic, I need to jump the gun a little bit and briefly discuss the SELECT statement. This statement is covered thoroughly in Chapter 5. For now, it's important to understand that to return data from tables in SQL Server, you will use the SELECT statement. In relational databases, information gets transformed into data, typically by storing it in multiple tables. It would stand to reason, then, that to turn data back into useful information it must be retrieved from multiple tables. This is accomplished by using a handful of techniques: joins, subqueries, and unions. You learn more about these topics in future chapters. For now, know that these represent the bulk of the work you will do as you create and use queries.

Here's a simple example. When the following query runs, the query processor parses the query and breaks it down into individual steps.

 SELECT TOP 10 Product.Name ,SalesOrderDetail.LineTotal  FROM Product INNER JOIN SalesOrderDetail ON Product.Productid = SalesOrderDetail.ProductID WHERE SalesOrderDetail.SpecialOfferID = 1 ORDER BY SalesOrderDetail.LineTotal DESC

Figure 4-1 shows the execution plan for the preceding query. Reading from right-to-left, the query optimizer chooses to implement each operational step based on available resources and statistical information about the data. The first two steps (at the beginning point of the two branches) show you that clustered index scans were used to initially retrieve data from both of the tables referenced in this query. Subsequent steps are chosen and analyzed for efficiency as the data is handled through the query process.

image from book
Figure 4-1:

The low-level instructions used to process these steps are compiled into executable instruction code and cached in-memory so that subsequent executions don't require the same degree of preparation and resource overhead. Depending on whether this query is part of an ad-hoc SQL statement or a saved database object, the compiled instructions may also be saved to permanent storage, improving efficiency in the long term.

Set-Based Operations

When SQL Server processes a SELECT command, it builds a structure in memory to return a result set. This structure, essentially a two-dimensional array of rows and columns, is known as a cursor. The word cursor is an acronym for CURrent Set Of Records. As such, it represents the entire set of rows returned from a table or query. SQL Server's query-processing engine is built on a foundation of cursor processing and is optimized to work with data as a set of records, rather than individual rows.

Row-Based Operations

A technique more popular in other database products is to populate a cursor type variable from a SELECT query and then step through each row. You can do this in SQL Server but it often works against the query-processing engine. Whenever possible, it is advisable to work with this set-based result paradigm rather than trying to process individual rows.

Row-level cursor operations have their place. This technique is discussed in Chapter 10.

Query Syntax Basics

A query is like a sentence; it must to be a complete statement with at least a noun and a verb. The semantic rules of SQL define a simple structure. You start with a clause that states what you intend to do: Select, Insert, Update, or Delete — these are the verbs. You also must define the columns or values to be returned. Usually, you will indicate the table or other database object you want to work with — this is the subject or noun. Depending on the type of operation, there are connecting words such as From and Into.

You'll learn about each of these statements in greater detail later but, for now, some simple examples follow. If you want to retrieve all of the column values from all rows in the Product table, you would execute the following query:

 SELECT * From Product

If you need to raise the cost of all product records by ten percent, this statement would work:

 UPDATE Product SET StandardCost = StandardCost * 1.1

The Transact-SQL language is very forgiving when it comes to formatting statements. The SQL Server query-processing engine doesn't care about whether commands are in upper- or lowercase. It doesn't care about spaces, tabs, and carriage returns as long as they don't interfere with the name of a command or value. This means that you can format your script for readability just about any way you like. For example, the following query returns product sales information for a range of dates, sorted by product category and subcategory. The query could be written like this:

 SELECT ProductCategory.Name AS Category, ProductSubCategory.Name AS SubCategory,  Product.Name AS ProductName, SalesOrderHeader.OrderDate, SalesOrderDetail.OrderQty,  SalesOrderDetail.UnitPrice FROM SalesOrderHeader INNER JOIN SalesOrderDetail ON  SalesOrderHeader.SalesOrderID = SalesOrderDetail.SalesOrderID INNER JOIN Product ON  SalesOrderDetail.ProductID = Product.ProductID INNER JOIN ProductSubCategory ON  Product.ProductSubCategoryID = ProductSubCategory.ProductSubCategoryID INNER JOIN  ProductCategory ON ProductSubCategory.ProductCategoryID = ProductCategory.ProductCategoryID WHERE SalesOrderHeader.OrderDate BETWEEN  '1/1/2003’ AND  Product.Name

Or, it could be written like this:

 SELECT       ProductCategory.Name AS Category , ProductSubCategory.Name AS SubCategory , Product.Name AS ProductName , SalesOrderHeader.OrderDate , SalesOrderDetail.OrderQty , SalesOrderDetail.UnitPrice FROM     SalesOrderHeader INNER JOIN SalesOrderDetail ON SalesOrderHeader.SalesOrderID = SalesOrderDetail.SalesOrderID INNER JOIN Product ON SalesOrderDetail.ProductID = Product.ProductID INNER JOIN ProductSubCategory ON Product.ProductSubCategoryID = ProductSubCategory.ProductSubCategoryID INNER JOIN ProductCategory ON ProductSubCategory.ProductCategoryID = ProductCategory.ProductCategoryID WHERE    SalesOrderHeader.OrderDate BETWEEN  ORDER BY ProductCategory.Name, ProductSubCategory.Name, Product.Name

Obviously, the second query is easier to read and would be much easier for someone to look at and figure out what's going on. Because both uppercase and lowercase statements are acceptable, a query could be written as follows:

 select name, standardcost from product

Although the preceding statement would execute just fine, it's not quite as easy to read as the following:

 SELECT Name, StandardCost FROM Product

Naming Conventions

There seems to be a universal concept that anything that can be very simple and uncomplicated must become confusing and overly complicated. When a database is created, objects should be named according to some kind of sensible naming convention. There is no industry-wide standard, and people have different ideas about appropriate naming conventions. Most folks perceive this as a simple matter of common sense, so they don't put much effort into it. The problem with common sense is that it's not very common and everyone seems to have their own idea about what is sensible.

It would be very convenient to have one simple standard to follow, and if things were that simple, I'd tell you exactly what to do. Most of it is quite easy, but object naming is a bit of an art. There are many considerations. For example, it's a good idea to use names that are descriptive and complete, describing the purpose of each object. On the other hand, you should keep names short and simple so users don't have to do a lot of extra typing. These are conflicting directions.

Be cautious about using any names that duplicate the name of a command or other reserved word. Before deciding upon a table, field, or other object name, review the list of data types, SQL clauses, and function names. Although it is possible to use some of these names, it's never a good idea. Once you have established a name, don't reuse it for something else. For example, the Product table shouldn't contain a Product column. SQL Server will not complain if you do this, but it will be very confusing to someone trying to write queries or read your SQL script.

Some older database products don't support mixed-case names or names containing spaces. For this reason, many database administrators continue to use all lowercase names with words separated by underscores. Personally, I find mixed-case names a little easier on the eyes.

Trends come and go. With Windows 95, Microsoft promoted the use of long file names. Microsoft Access, which was developed at about the same time, also promoted the use of long database object names. From a certain perspective, it makes sense to use friendly, sentence-like, descriptive names. The fact is that SQL Server will have no problem with names containing spaces, but other components of a solution may have issues with this. As values are handled at different levels of an application, they may move from the controls in the user interface to variables in the program code, then to the method parameters or properties of a class. Eventually, these values are passed into a stored procedure as parameters or as field names in a SQL statement. The point is that it is much easier on everyone involved if these items all have the same, or very similar, names. Figure 4-2 shows an example of the data flow through a sample application.

image from book
Figure 4-2:

You could argue that there would be no harm in using space-filled field names in the database and similar names, sans spaces, elsewhere — and you'd probably be right. The general belief among database professionals is that spaces don't belong in object names. Frankly, this is probably more of an issue of perception, rather than technical feasibility.

I've done a lot of one-man solution development where I create the database, write the software components, design and develop the user interfaces, and write all the program code to glue the pieces together. Even in these applications, it's easy to get lost if related object names aren't the same. I have always insisted that they be consistent throughout the entire solution. I've also worked on some fairly large, complex projects where the database was designed by someone else long ago. If the names aren't clear and concise in the beginning, I'm faced with a quandary: change the names in my program code to something easier to understand (accepting that they don't match the table and field names), or consistently apply the same cryptic and confusing names throughout the entire solution.

It's not uncommon for a database designer to model and create tables, applying his or her own naming standards to table and field names. After this person has moved on to a different job, another database expert comes in and adds stored procedures. He might disagree with the names applied by the original designer, so he names the procedures and input parameters differently than the fields in the table. Along comes an outside consultant developing software components and he uses abbreviated names for the related class properties that correspond to the fields and parameters. Later, a junior-level software developer is assigned to create a user application for the data. He takes a class or reads a book about appropriate object naming standards and decides to fix the problem by applying his own names in spite of those that already exist. Coincidentally, I just finished modifying some report queries today. I had designed the tables these reports used. In testing, I discovered that performance wasn't ideal and decided to build another table with pre-aggregated data. Another database designer stepped in to help and named some of the columns differently than mine. For example, I have a column named FiscalMonthYearNumber and his was FiscalMonthNum. Is this a big deal? Not really, but it does require that I fix the queries for all of my reports.

There is no easy solution to this common problem. Ideally, the person who designs the database should carefully consider the impact of the names he chooses and document them thoroughly. This sets the standard for all those who follow — and all names should remain consistent. I typically use mixed-case names, capitalizing each word and concatenating them together. In programming circles, this is often referred to as Pascal Case, named after the Pascal programming language. The following table shows a few common naming standards with some of the history and pros and cons regarding each.

Open table as spreadsheet

Naming Standard	Example	Description
Pascal Case	CustomerFirstName	All words are capitalized and concatenated together without delimiting characters.
Camel Case	customerFirstName	All characters are lowercase except the first letter of each word after the first word. This standard is common in XML element names but not as common in database object naming.
Hungarian Notation	VcCustomerFirstName mstrCustomerFirstName	Objects are prefixed with characters used to denote data type and/or scope. This standard is more common in programming code than in database object naming. True Hungarian Notation can be very complicated and verbose.
Lower-case, delimited	customer_first_name	Comes from legacy database products that don't support mixed-case names. Still commonly used due to backward compatibility and tradition.
Long Names	Customer First Name	Promoted in Microsoft products such as Access. Has the advantage of being readable but not commonly used in serious software solutions. Incompatible with related programming code.

Commenting Script

When you write SQL script, it will inevitably be easy to read and understand — at the time you write it. Programming is something we do in a certain context. When I work on a project, my head is in that project and most everything makes perfect sense at the time. I once developed a database application for a consulting client. A few different people had been involved in writing queries and program code over a few months. They asked me to come back in to create some new reports. As I opened the project, reading through the code, I found it difficult to understand the logic. In my frustration, I said, "I don't understand what is going on in this code. Who wrote this?" My customer replied to my dismay, "Paul, that's your code. You wrote it last year." Needless to say, I was embarrassed as well as frustrated.

I learned a valuable lesson: comment everything. No matter how plain the logic seems. No matter how much sense it makes at the time, it probably won't make so much sense to the next person who reads it, especially months or years later. If nothing else reminds you of this simple lesson, just remember this: Every one of us leaves a legacy. Other query designers and programmers will remember you for what you leave behind for them to maintain. They will most likely either remember you for making their job difficult or for making their job easier.

Comments typically are made in two forms that include header blocks and in-line comments. A header block is a formal block of text that precedes every scripted object, such as a stored procedure or user-defined function. It should conform to a standard format and should contain information such as the following:

The name of the scripted object
The name of the designer or programmer
Contact information
Creation date
Revision dates and notes
Information about what the object does and how it's called
Validation testing and approval notes

Comments in Transact-SQL can be either in block format or on a single line. Block comments begin with a forward slash and at least one asterisk (/*) and end with an asterisk and a forward slash (*/). Everything in between is treated as a comment and ignored by the query parser. A header block doesn't need to be complicated. It should just be consistent. Here's an example of a simple header block preceding the script for a stored procedure used to insert product records:

 /******************************************************************* spInsProduct – Inserts product records Accepts ProductName, StandardPrice, QtyInStock, CategoryID Returns new ProductID, Int 6-12-04 by Paul Turley (paul@sqlreportservices.com) Revisions: 7-10-04 – PT – Added MarkupPercent parameter 7-12-04 – PT – Changed data type from Int to Decimal ********************************************************************/

In-line comments are placed in the body of the script to document the process and flow along with the actual script. Comments are preceded by two hyphens (--). The query parser ignores the remainder of the line. In-line comments can be placed after executable script on the same line or can be written on a separate line, as you can see in the following example:

 REATE PROCEDURE spGetCustomer -- Define City parameter, set default to Null if parameter not passed @City VarChar(25) = NULL AS IF @City IS Null       -- Check for Null (parameter not passed) BEGIN -- Return all Store records SELECT Store.Name AS StoreName, Address.City FROM Store INNER JOIN Customer ON Store.CustomerID = Customer.CustomerID INNER JOIN CustomerAddress ON Customer.CustomerIDCustomer = Address.CustomerID INNER JOIN Address ON CustomerAddress.AddressID = Address.AddressID END ELSE BEGIN -- Return Store records only for matching City SELECT Store.Name AS StoreName, Address.City FROM Store INNER JOIN Customer ON Store.CustomerID = Customer.CustomerID INNER JOIN CustomerAddress ON Customer.CustomerID = CustomerAddress.CustomerID INNER JOIN Address ON CustomerAddress.AddressID = Address.AddressID WHERE Address.City = @City END

If in doubt, add a comment. If not in doubt, add one anyway. Don't worry about overdoing it. Granted, some of your script will make sense without commenting and may be somewhat self-documenting, but don't take a chance. Don't listen to yourself when that little voice says "Don't worry about commenting your code now. You can do it later." Maybe you're more disciplined than I am, but if I don't write comments when I'm writing code, it won't get done.

Another important application of in-line comments is temporary development notes to myself and others. Inevitably, on the first pass through my script, I'm most concerned about getting core functionality working. Exceptions to basic logic, problem workarounds, error-handling, and less-common conditions are usually secondary to just getting the code to work once under ideal conditions. As I consider all of these secondary factors, I make notes to myself that may include to-do items and reminders to go back and add clean-up code and polished features.

Using Templates

Query Analyzer contains a very useful, and often underutilized, feature. Templates provide a starting place for a variety of database object scripts. Several templates come with SQL Server and adding your own is an easy task. In Chapter 3 you learned how to use the script template features in Query Analyzer and the SQL Server Management Studio. In reality, a template is just a text file containing SQL commands and placeholders for object names. Using a template can save considerable time and effort, especially when writing script that you may not use very often. For example, I don't write an Instead Of Trigger very often, so I may not be able to remember the syntax off the top of my head. Rather than scouring Books Online or searching the web for help, I'll simply open this template in a new script window:

 -- ============================================= -- Create basic Instead Of Trigger -- ============================================= IF EXISTS (SELECT name  FROM   sysobjects WHERE  name = AND    type = 'TR’) DROP TRIGGER <trigger_name, sysname, trig_test> GO CREATE TRIGGER <trigger_name, sysname, trig_test> ON <table_or_view_name, sysname, pubs.dbo.sales> INSTEAD OF INSERT AS BEGIN RAISERROR (50009, 16, 10) EXEC sp_who END GO

This script provides a basic pattern to get you started. If you'd like to create your own templates, this is very easy to do. Simply write the script in a new query window and then use the File Save As menu to save it to a Template SQL file. In Query Analyzer, templates are saved with a tql extension, but in the SQL Management Studio templates are saved with a sql extension, like other script files.

Template files saved to the standard template folders will be added to the available templates for Query Analyzer and the SQL Management Studio. SQL Server 2000 assigns the extension .TQL to template files. SQL Server 2005 uses the standard .SQL extension. In a default SQL Server installation, these folders are found in the following locations:

Open table as spreadsheet

Version	Templates Folder Path
SQL Server 2000	C:\Program Files\Microsoft SQL Server\80\Tools\Templates
SQL Server 2005	C:\Program Files\Microsoft SQL Server\90\Tools\Binn\VSShell\ Common7\IDE\sqlworkbenchnewitems\Sql

Generating Script

The term script simply refers to a bunch of SQL statements typically used together to perform some useful purpose. This could vary from a simple SQL query to a group of several queries and commands used to create an entire database. The SQL Server client and administrative tools (Enterprise Manager, Query Analyzer, Workbench, and Visual Studio) have the ability to generate script for just about any object or combination of objects. You can even generate the script to re-create an entire database. Script is usually saved to a file with a sql extension and is simply plain text. You could easily create or edit a script file with Notepad or any other text editor.

Managing Script

I suggest that you have a designated folder for your script files. I usually create a folder for each database project or application I'm working on. I will also make it a point to back up and archive the script files in these folders. Just like program source code, my scripts are treated like gold. Especially for the programming objects in your database (views, functions, and stored procedures), these objects should be scripted and saved on a CD or protected network folder. This practice will be invaluable in case (make that when) something goes wrong.

To keep you and your co-workers on their toes, you may find it helpful to stage a routine fire drill of sorts. On a bi-weekly basis, designate a hard disk or folder as corrupted or deleted and then go in and take inventory of your losses. You can use this exercise to let other team members know how well your recovery plan is working and how it can be improved.

Version Control

One of the greatest challenges in managing scripts is how to keep track of multiple versions of the same file. The natural order of application development involves prototyping a feature, working through iterations of design, testing, and debugging until the feature is stable. At this point, it is important to keep a copy of the working SQL script and program source code before any changes are made. Adding features and capabilities to a query or programming object nearly always has adverse effects, at least in the short term. The secret to success is to script and save your queries and objects to script files after you get them working.

On many occasions, I have been asked to make some minor change or enhancement to a project. It may involve adding a column to a table or just adding a calculation to a query. I do a quick test and then implement the change on the production server the night before flying out to a training engagement. Almost inevitably, the change will have some unforeseen impact. Perhaps my customer calls in a panic to inform me that she's getting an error, it's the end of the month, and they can't print invoices.

Making this minor change often seems like a good idea at the time. Fortunately, if objects were scripted before making changes, it's usually a simple task to either review the original script and make corrections, or to run the original script, returning the query to its previous, working state. Script version management is not complicated, but without having a system in place and making a deliberate effort to follow it, it's easy to lose track of your changes.

A few simple approaches to version control exist. Version control software, such as Microsoft Visual SourceSafe, automates this task by storing files in a central database. As files are checked out and checked in, a separate copy is time-stamped and stored in the SourceSafe database. Any version of the file can then be retrieved at any time. SourceSafe is the best bet if a group of people will be sharing script files and working on different, networked development computers. SQL Server 2000 has no built-in integration with Visual SourceSafe, but SourceSafe is easy enough to use at the file system level. The SQL Server 2005 Workbench does integrate with Visual SourceSafe. Much like Visual Studio, files can be checked out, checked in, and managed from within the Workbench design environment.

A less-sophisticated approach is to simply append file names with the date they are created and the initials of the person creating them. Keep these files in project-related folders and back them up regularly. The following are examples of the script files for a stored procedure called spGetCustomerAccountDetail:

Create spGetCustomerAccountDetail – 7-02-04 PT.sql
Create spGetCustomerAccountDetail – 7-09-04 PT.sql
Create spGetCustomerAccountDetail – 7-11-04 PT.sql
Create spGetCustomerAccountDetail – 7-15-04 PT.sql
Create spGetCustomerAccountDetail – 8-03-04 PT.sql