LINQ to SQL | Introducing MicrosoftВ® LINQ

The first and most obvious application of LINQ is for querying an external relational database. LINQ to SQL is a component of the LINQ Project that provides the capability to query a relational database, offering you an object model based on available entities. In other words, you can define a set of objects that represents a thin abstraction layer over the relational data, and you can query this object model using LINQ queries that are converted into corresponding SQL queries by the LINQ to SQL component.

In LINQ to SQL, we can write a simple query like the following:

 var query =     from    c in Customers     where   c.Country == "USA"             && c.State == "WA"     select  new {c.CustomerID, c.CompanyName, c.City };

This query is converted into an SQL query that is sent to the relational database:

 SELECT CustomerID, CompanyName, City FROM   Customers WHERE  Country = 'USA'   AND  Region = 'WA'

Important

The SQL queries generated by LINQ that we show in this chapter are only illustrative. Microsoft reserves the right to change the SQL that is generated by LINQ, and at times we use simplified queries in the text. Thus, you should not rely on the SQL that is shown.

At this point, you might be asking a few questions. First, how can the LINQ query be written using object names that are validated by the compiler? Second, when is the SQL query generated from the LINQ query? Third, when is the SQL query executed? To understand the answers to these questions, you need to understand the entity model in LINQ to SQL and then the deferred query evaluation.

Entities in LINQ to SQL

Any external data must be described with appropriate metadata bound to class definitions. Each table must have a corresponding class decorated with particular attributes, which corresponds to a row of data and describes all columns in terms of data members of the defined type. The type can be a complete or partial description of an existing physical table, view, or stored procedure result. Only the described fields can be used inside a LINQ query for both projection and filtering. Listing 5-1 shows a small and simple entity definition.

Listing 5-1: Entity definition for LINQ to SQL

  [Table(Name="Customers")] public class Customer {     [Column] public string CustomerID;     [Column] public string CompanyName;     [Column] public string City;     [Column(Name="Region")] public string State;     [Column] public string Country; }

The Customer type defines the content of a row, and each field or property decorated with Column corresponds to a column of the relational table. The Name parameter can specify a column name that is different from the data member name (in this example, State corresponds to the Region table column). The Table attribute specifies that the class is an entity representing data of a database table; its Name property can specify a table name that is different from the entity name. It is common to use the singular form for the name of the class (a single row) and the plural form for the name of the table (a set of rows).

You need a Customers table to build a LINQ to SQL query over Customers data. The Table<T> generic class is the right way to create such a type:

 Table<Customer> Customers = ...; // ... var query =     from    c in Customers     // ...

Note

To build a LINQ query on Customers, you need a class implementing IEnumerable<T>, using Customer as T. However, LINQ to SQL needs to implement extension methods in a different way than the SQL to Objects implementation that we used in the Chapter 4, “LINQ Syntax Fundamentals.” For this reason, you need to use an object implementing IQueryable<T> to build LINQ to SQL queries. The Table<T> class implements IQueryable<T>. To include the LINQ to SQL extension, the statement using System.Data.Linq; must be part of the source code.

The Customers table object has to be instantiated. To do that, we need an instance of a DataContext class, which defines the bridge between the LINQ world and the external relational database. The nearest concept to DataContext that comes to mind is a database connection-in fact, a mandatory parameter needed to create a DataContext instance is the connection string or the Connection object. Its GetTable method returns a corresponding Table<T> for the specified type:

 DataContext db = new DataContext("Database=Northwind"); Table<Customer> Customers = db.GetTable<Customer>();

Listing 5-2 shows the resulting code when you put all the pieces together.

Listing 5-2: Simple LINQ to SQL query

  DataContext db = new DataContext( ConnectionString ); Table<Customer> Customers = db.GetTable<Customer>(); var query =     from    c in Customers     where   c.Country == "USA"             && c.State == "WA"     select  new {c.CustomerID, c.CompanyName, c.City }; foreach( var row in query ) {     Console.WriteLine( row ); }

The query variable is initialized with a query expression that forms an expression tree. As we noted in Chapter 2, “C# Language Features,” an expression tree maintains a representation of the expression in memory instead of pointing to a method through a delegate. When the foreach loop enumerates data selected by the query, the expression tree is used to generate the corresponding SQL query, using all metadata and information we have in the entity classes and in the referenced DataContext instance.

Note

The deferred execution method used by LINQ to SQL converts the expression tree into an SQL query that is valid in the underlying relational database. The LINQ query is functionally equivalent to a string containing an SQL command, with at least two important differences. First, it is tied to the object model and not to the database structure. Second, its representation is semantically meaningful without requiring an SQL parser and without being tied to a specific SQL dialect. The expression tree can also be manipulated in memory before its use.

The data returned from the SQL query accessing row and placed into the foreach loop is then used to fill the projected anonymous type following the select keyword. In this sample, the Customer class is never instantiated, and it is used only by LINQ to analyze its metadata.

We can explore the generated SQL query by using the GetQueryText method of the DataContext class:

 Console.WriteLine( db.GetQueryText( query ) );

The previous simple LINQ to SQL query generates the following GetQueryText output:

 SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[City] FROM   [Customers] AS [t0] WHERE  ([t0].[Country] = @p0) AND ([t0].[Region] = @p1)

An alternative way to get a trace of all SQL statements sent to the database is to assign a value to the Log property of DataContext:

 db.Log = Console.Out;

In the next section, you will see in more detail how to generate entity classes for LINQ to SQL.

Data Modeling

The set of entity classes that LINQ to SQL requires is a thin abstraction layer over the relational model. Each entity class defines an accessible table of data, which can be queried and modified. Entity instances that are modified can apply their changes on data contained in the relational database. You will see the options for data updates in the “Data Update” section of this chapter. In this section, you will learn how to build a data model for LINQ to SQL.

DataContext

The DataContext class handles the communication between LINQ and external relational data sources. Each instance has a single Connection to a relational database. Its type is IDbConnection; therefore, it is not specific to a particular database product.

Important

The architecture of LINQ to SQL supports many data providers so that it can map to different underlying relational databases. At the time of this writing, beta versions of LINQ support only Microsoft SQL Server for LINQ to SQL, but anyone can implement providers of his own.

DataContext uses metadata information to map the physical structure of relational data, on which the SQL code generation is based. DataContext can also be used to call a stored procedure and persist data changes in entity class instances in the relational database.

Classes that specialize access for a particular database can be derived from DataContext. Such classes offer an easier way to access relational data, including members that represent available tables. You can define the existing tables simply by declaring them, without a specific initialization, as in the following code:

 public class SampleDb : DataContext {     public SampleDb(IDbConnection connection)             : base( connection ) {}     public SampleDb(string fileOrServerOrConnection)             : base( fileOrServerOrConnection ) {}     public SampleDb(IDbConnection connection, MappingSource mapping)             : base( connection, mapping ) {}     public Table<Customer> Customers; }

Note

Table members are initialized automatically by the DataContext base constructor, which examines the type at execution time through Reflection, finds those members, and initializes them.

Entity Classes

An entity class has two roles. The first is to provide metadata to LINQ queries; for this purpose, an entity class is not instantiated. The second is to provide storage for data read from the relational data source, as well as to track possible updates and support their submission back to the relational data source.

An entity class is any reference type definition decorated with the Table attribute. A struct (value type) cannot be used for this. The Table attribute can have a Name parameter that defines the name of the corresponding table in the database. If Name is omitted, the name of the class is used as the default:

 [Table(Name="Products")] public class Product { ... }

Note

Although the term commonly used is table, nothing prevents you from using an updatable view in place of a table name in the Name parameter. Using a non-updatable view will work too, at least until you try to update data without using that entity class.

Inside an entity class, there can be any number and type of members. Only data members or properties decorated with the Column attribute are significant in defining the mapping between the entity class and the corresponding table in the database:

 [Column] public int ProductID;

An entity class should have a unique key. This key is necessary to support unique identity (more on this later), to identify corresponding rows in database tables, and to generate SQL statements that update data. If you do not have a primary key, instances of the entity class can be created but are not modifiable. The Boolean IsPrimaryKey property of the Column attribute, set to true, states that the column belongs to the primary key of the table. If the primary key used is a composite key, all the columns that form the primary key will have IsPrimaryKey=true in their parameters:

 [Column(IsPrimaryKey=true)] public int ProductID;

By default, a column is mapped using the same name of the member to which the Column attribute is applied. You can use a different name, specifying a value for the Name parameter. For example, the following Price member corresponds to the UnitPrice field in the database table:

 [Column(Name="UnitPrice")] public decimal Price;

If you want to filter data access through member property accessors, you have to specify the underlying storage member with the Storage parameter. If you specify a Storage parameter, LINQ to SQL bypasses the public property accessor and interacts directly with the underlying value. Understanding this is particularly important if you want to track only the modifications made by your code and not the read/write operations made by the LINQ framework. In the following code, the ProductName property is accessed for each read/write operation made by your code; a direct read/write operation on the _ProductName data member is made when a LINQ operation is executed:

 [Column(Storage="_ProductName")] public string ProductName {     get { return this._ProductName; }     set { this.OnPropertyChanging("ProductName");           this._ProductName = value;           this.OnPropertyChanged("ProductName");     } }

The correspondence between relational type and .NET type is made assuming a default relational type corresponding to the used .NET type. Whenever you need to define a different type, you can use the DBType parameter, specifying a valid type by using a valid SQL syntax for the relational data source. This property is used only if you want to create a database schema starting from entity class definitions:

 [Column(DBType="NVARCHAR(20)")] public string QuantityPerUnit;

If a column value is auto-generated by the database (which is a service offered by the IDENTITY keyword in SQL Server), you might want to synchronize the entity class member with the generated value whenever you insert an entity instance into the database. To get this behavior, you need to set the IsDBGenerated parameter to true, and you also need to adapt the DBType accordingly-for example, by adding the IDENTITY modifier for SQL Server tables:

 [Column(DBType="INT NOT NULL IDENTITY",  IsPrimaryKey=true, IsDBGenerated=true)] public int ProductID;

Other parameters that are relevant in updating data are IsVersion and UpdateCheck. You will see a deeper explanation of IsDBGenerated, IsVersion, and UpdateCheck later in the “Data Update” section.

Entity Inheritance

Sometime a single table contains many types of entities. For example, imagine a list of contacts-some of them can be customers, others can be suppliers, and others can be company employees. From a data point of view, each entity can have some specific fields. (For example, a customer can have a discount field, which is not relevant for employees and suppliers.) From a business logic point of view, each entity can implement different business rules. The best way to model this kind of data in an object-oriented environment is by leveraging inheritance to create a hierarchy of specialized classes. LINQ to SQL allows a set of classes derived from the same base class to map the same relational table.

The InheritanceMapping attribute decorates the base class of a hierarchy, indicating the corresponding derived classes that are based on the value of a special discriminator column. The Code parameter defines a possible value, and the Type parameter defines the corresponding derived type. The discriminator column is defined by the IsDiscriminator argument being set to true in the Column attribute specification. Listing 5-3 provides an example of a hierarchy based on the Contacts table of the Northwind sample database.

Listing 5-3: Hierarchy of classes based on contacts

  [Table(Name="Contacts")] [InheritanceMapping(Code = "Customer", Type = typeof(CustomerContact))] [InheritanceMapping(Code = "Supplier", Type = typeof(SupplierContact))] [InheritanceMapping(Code = "Shipper", Type = typeof(ShipperContact))] [InheritanceMapping(Code = "Employee", Type = typeof(Contact), IsDefault = true)]  public class Contact {     [Column(IsPrimaryKey=true)] public int ContactID;     [Column(Name="ContactName")] public string Name;     [Column] public string Phone;     [Column(IsDiscriminator = true)] public string ContactType; } public class CompanyContact : Contact {     [Column(Name="CompanyName")] public string Company; } public class CustomerContact : CompanyContact { } public class SupplierContact : CompanyContact { } public class ShipperContact : CompanyContact {     public string Shipper {         get { return Company; }         set { Company = value; }     } }

Contact is the base class of the hierarchy. If the contact is a Customer, a Supplier, or a Shipper, the corresponding classes derive from an intermediate CompanyContact, which defines the Company field corresponding to the CompanyName column in the source table. The CompanyContact intermediate class is necessary because you cannot reference the same column (CompanyName) in more than one field, even if this happens in different classes in the same hierarchy. The ShipperContact class defines a Shipper property that exposes the same value of Company, with a different semantic meaning.

Note

This approach requires you to flatten the union of all possible data columns for the whole hierarchy into a single table. If you have a normalized database, you might have data for different entities separated in different tables. You can define a view to use LINQ to SQL to support entity hierarchy, but to update data you must make the view updatable.

The level of abstraction offered by having different entity classes in the same hierarchy is well described by the sample queries shown in Listing 5-4. The queryTyped query uses the OfType operator, while queryFiltered relies on a standard where condition to filter out contacts that are not customers.

Listing 5-4: Queries using a hierarchy of entity classes

  var queryTyped =     from    c in contacts.OfType<CustomerContact>()     select  c; var queryFiltered =     from    c in contacts     where   c is CustomerContact     select  c; foreach( var row in queryTyped ) {     Console.WriteLine( row.Company ); } // We need an explicit cast to access the CostumerContact members foreach( CustomerContact row in queryFiltered ) {     Console.WriteLine( row.Company ); }

The SQL queries produced by these LINQ queries are functionally identical to the following one. (The actual query is different because of generalization coding.)

 SELECT [t0].[ContactType], [t0].[CompanyName] AS [Company],        [t0].[ContactID], [t0].[ContactName] AS [Name],        [t0].[Phone] FROM   [Contacts] AS [t0] WHERE  [t0].[ContactType] = 'Customer'

The difference between queryTyped and queryFiltered queries lies in the returned type. A queryTyped query returns a sequence of CustomerContact instances, while queryFiltered returns a sequence of the base class Contact. With queryFiltered, you need to explicitly cast the result into a CustomerContact type if you want to access the Company property.

Unique Object Identity

An instance of an entity class stores an in-memory representation of table row data. If you try to instantiate two different entities containing the same row, you obtain a reference to the same inmemory object. In other words, object identity (same references) maintains data identity (same table row) using the entity unique key. The LINQ to SQL engine ensures that the same object reference is used when an entity instantiated from a query result is already in memory. This check does not happen if you create an instance of an entity by yourself. In Listing 5-5, you can see that c1 and c2 reference the same Contact instance, even if they originate from two different queries, while c3 is a different object, even if its content is equivalent to the others.

Note

If you want to force reloading data from the database, you must use the Refresh method of the DataContext class. We will say more about this later in the “Concurrent Operations” section.

Listing 5-5: Object identity

  var queryTyped =     from    c in contacts.OfType<CustomerContact>()     orderby c.ContactID     select  c; var queryFiltered =     from    c in contacts     where   c is CustomerContact     orderby c.ContactID     select  c; Contact c1 = null; Contact c2 = null; foreach( var row in queryTyped.Take(1) ) {     c1 = row; } foreach( var row in queryFiltered.Take(1) ) {     c2 = row; } Contact c3 = new Contact(); c3.ContactID = c1.ContactID; c3.ContactType = c1.ContactType; c3.Name = c1.Name; c3.Phone = c1.Phone; Debug.Assert( c1 == c2 ); // same instance Debug.Assert( c1 != c3 ); // different objects

Entity Constraints

The entity classes cannot represent all possible check constraints of a relational table. No attributes are available to specify the same alternate keys (unique constraint), triggers, and check expressions that can be defined in a relational database. This fact is relevant when you start to manipulate data using entity classes, because you cannot guarantee that an updated value will be accepted by the underlying database. (For example, it could have a duplicated unique key.) However, because you can load into entity instances only parts (rows) of the whole table, these kinds of checks are not possible without accessing the relational database, anyway.

There is partial support for describing a primary key, unique constraint, and other indexes only through XML external metadata specification. You will see this discussed later in the “External Mapping” section. This support is useful only to generate a database starting from LINQ to SQL metadata.

More complete support is available for maintaining valid relationships between entities, just like the support offered by foreign keys in a standard relational environment.

Associations Between Entities

Relationships between entities in a relational database are modeled on the concept of foreign keys referring to primary keys of a table. Class entities can use the same concept through the Association attribute, which can describe both sides of a one-to-many relationship described by a foreign key.

EntityRef Let us start with the concept of lookup, which is the typical operation used to get the customer related to one order. Lookup can be seen as the direct translation into the entity model of the foreign key relationship existing between the CustomerID column of the Orders table and the primary key of the Customers table. In our entity model, the Order entity class will have a Customer property (of type Customer), that shows the customer data. This property is decorated with the Association attribute and stores its information in an EntityRef<Customer> member (named _Customer), which enables the deferred loading of references that you will see shortly. Listing 5-6 shows the definition of this association.

Listing 5-6: Association EntityRef

  [Table(Name="Orders")] public class Order {     [Column(IsPrimaryKey=true)] public int OrderID;     [Column] private string CustomerID;     [Association(Storage="_Customer", ThisKey="CustomerID")]     public Customer Customer {         get { return this._Customer.Entity; }         set { this._Customer.Entity = value; }     }     private EntityRef<Customer> _Customer; }

As you can see, the CustomerID column must be defined in Order because otherwise it would not be possible to obtain the related Customer. The ThisKey argument or the Association attribute indicates the “foreign key” column (which would be a comma-separated list if more columns were involved for a composite key) that is used to define the relationship between entities. If you want to hide this detail in the entity properties, you can declare that column as private, just as in the Order class shown earlier.

Using the Order class in a LINQ query, you can specify a Customer property in a filter without the need to write a join between Customer and Order entities. In the following query, the Country member of the related Customer is used to filter orders that come from customers of a particular Country:

 Table<Order> Orders = db.GetTable<Order>(); var query =     from   o in Orders     where  o.Customer.Country == "USA"     select o.OrderID;

The previous query is translated into an SQL JOIN like the following one:

 SELECT    [t0].[OrderID] FROM      [Orders] AS [t0] LEFT JOIN [Customers] AS [t1]        ON [t1].[CustomerID] = [t0].[CustomerID] WHERE     [t1].[Country] = "USA"

Until now, we have used entity relationships only for their metadata-building LINQ queries. When an instance of an entity class is created, a reference to another entity (such as the previous Customer property) works with a technique called deferred loading. The related Customer entity is not instantiated and loaded into memory from the database until it is accessed either in read or write mode.

More Info

EntityRef<T> is a wrapper class that is instantiated with the container object to give a valid reference for any access to the referenced entity. Each read/write operation is filtered by a property getter and setter, which execute a query to load data from the database the first time this entity is accessed.

In other words, to generate an SQL query to populate the Customer related entity when the Country property is accessed, you would use the following code:

 var query =     from   o in Orders     where  o.OrderID == 10528     select o; foreach( var row in query ) {     Console.WriteLine( row.Customer.Country ); }

The process of accessing the Customer property involves checking to determine whether the related Customer entity is already in memory. If it is, that entity is accessed; otherwise, the following SQL query is executed and the corresponding Customer entity is loaded in memory and then accessed:

 SELECT [t0].[Country], [t0].[CustomerID], [t0].[CompanyName] FROM   [Customers] AS [t0] WHERE  [t0].[CustomerID] = "GREAL"

The GREAL string is the CustomerID value for order 10528. As you can see, the SELECT statement queries all columns declared in the Customer entity, even if they are not used in the expression that accessed the Customer entity. (In this case, the executed code never referenced the CompanyName member.)

EntitySet The other side of an association is a table that is referenced from another table through its primary key. Although this is an implicit consequence of the foreign key constraint in a relational model, you need to explicitly define this association in the entity model. If the Customers table is referenced from the Orders table, you can define an Orders property in the Customer class that represents the set of Order entities related to a given Customer. The relationship is implemented by an instance of EntitySet<Order>, which is a wrapper class over the sequence of related orders. You might want to directly expose this EntitySet<T> type, as in the code shown in Listing 5-7. In that code, the OtherKey argument of the Association attribute specifies the name of the member on the related type (Order) that defines the association between Customer and the set of Order entities.

Listing 5-7: Association EntitySet (visible)

  [Table(Name="Customers")] public class Customer {     [Column(IsPrimaryKey=true)] public string CustomerID;     [Column] public string CompanyName;     [Column] public string Country;     [Association(OtherKey="CustomerID")]     public EntitySet<Order> Orders; }

You might also decide to hide this implementation detail, making only an ICollection<Order> visible outside of the Customer class, as in the declaration shown in Listing 5-8. In this case, the Storage argument of the Association attribute specifies the EntitySet<T> physical storage.

Listing 5-8: Association EntitySet (hidden)

  public class Customer {     [Column(IsPrimaryKey=true)] public string CustomerID;     [Column] public string CompanyName;     [Column] public string Country;     private EntitySet<Order> _Orders;     [Association(OtherKey="CustomerID", Storage="_Orders")]     public ICollection<Order> Orders {         get { return this._Orders; }         set { this._Orders.Assign(value); }     } }

With both models of association declaration, you can use the Customer class in a LINQ query, accessing the related Order entities without the need to write a join. You simply specify the Orders property. The next query returns the names of customers who placed more than 20 orders:

 Table<Customer> Customers = db.GetTable<Customer>(); var query =     from   c in Customers     where  c.Orders.Count > 20     select c.CompanyName;

The previous LINQ query is translated into an SQL query like the following one:

 SELECT [t0].[CompanyName] FROM [Customers] AS [t0] WHERE ( SELECT COUNT(*)         FROM [Orders] AS [t1]         WHERE [t1].[CustomerID] = [t0].[CustomerID]        ) > 20

In this case, no instances of the Order entity are created. The Orders property serves only as a metadata source to generate the desired SQL query. If you return a Customer entity from a LINQ query, you can access the Orders of a customer on demand:

 var query =     from   c in Customers     where  c.Orders.Count > 20     select c; foreach( var row in query ) {     Console.WriteLine( row.CompanyName );     foreach( var order in row.Orders ) {         Console.WriteLine( order.OrderID );     } }

In the previous code, you are using deferred loading. Each time you access the Orders property of a customer for the first time (as indicated by the highlighted code in the preceding code sample), a query like the following one (which uses @p0 as the parameter to filter CustomerID) is sent to the database:

 SELECT [t0].[OrderID], [t0].[CustomerID] FROM [Orders] AS [t0] WHERE [t0].[CustomerID] = @p0

If you want to load all orders for all customers into memory using only one query to the database, you need to request immediate loading instead of deferred loading. To do that, you have two options. The first approach, which is demonstrated in Listing 5-9, is to force the inclusion of an EntitySet using a DataShape instance and the call of its LoadWith<T> method.

Listing 5-9: Use of DataShape and LoadWith<T>

  DataContext db = new DataContext( ConnectionString ); Table<Customer> Customers = db.GetTable<Customer>(); DataShape ds = new DataShape(); ds.LoadWith<Customer>( c => c.Orders ); db.Shape = ds; var query =      from   c in Customers      where  c.Orders.Count > 20      select c;

The second option is to return a new entity that explicitly includes the Orders property for the Customer:

 var query =     from   c in Customers     where  c.Orders.Count > 20     select new { c.CompanyName, c.Orders };

These LINQ queries send an SQL query to the database to get all customers who placed more than 20 orders, including the whole order list for each customer. That SQL query might be similar to the one shown in the following code:

 SELECT [t2].[CompanyName], [t3].[OrderID], [t3].[CustomerID], (     SELECT COUNT(*)     FROM [Orders] AS [t4]     WHERE [t4].[CustomerID] = [t2].[CustomerID]     ) AS [count] FROM (     SELECT [t0].[CustomerID], [t0].[CompanyName]     FROM [Customers] AS [t0]     WHERE (         SELECT COUNT(*)         FROM [Orders] AS [t1]         WHERE [t1].[CustomerID] = [t0].[CustomerID]         ) > 20     ) AS [t2] LEFT OUTER JOIN [Orders] AS [t3] ON [t3].[CustomerID] = [t2].[CustomerID] ORDER BY [t2].[CustomerID], [t3].[OrderID]

Note

You can observe that there is a single SQL statement here and the LINQ to SQL engine parses the result, extracting different entities (Customers and Orders). Keeping the result ordered by CustomerID, the engine can build in-memory entities and relationships in a faster way.

You can filter the subquery produced by relationship navigation. Suppose you want to see only customers who placed at least five orders in 1997, and you want to load and see only these orders. You can use the AssociateWith<T> method of the DataShape class to do that, as demonstrated in Listing 5-10.

Listing 5-10: Use of DataShape and AssociateWith<T>

  DataShape ds = new DataShape(); ds.AssociateWith<Customer>(     c => c.Orders.Where(         o => o.OrderDate.Value.Year == 1997 ) ); db.Shape = ds; var query =      from   c in Customers      where  c.Orders.Count > 5      select c;

You will appreciate that the C# filter condition (o.OrderDate.Value.Year == 1997) is translated into the following SQL expression:

 (DATEPART(Year, [t2].[OrderDate]) = 1997)

Using AssociateWith<T> alone does not apply the immediate loading behavior. If you want both immediate loading and filtering through a relationship, you have to call both the LoadWith<T> and AssociateWith<T> methods. The order of these calls is not relevant. For example, you can write the following code:

 DataShape ds = new DataShape(); ds.AssociateWith<Customer>(     c => c.Orders.Where(         o => o.OrderDate.Value.Year == 1997 ) ); ds.LoadWith<Customer>( c => c.Orders ); db.Shape = ds;

Loading all data into memory using a single query might be a better approach if you are sure you will access all data that is loaded, because you will spend less time in round-trip latency. However, this technique will consume more memory and bandwidth when the typical access to a graph of entities is random. Think about these details when you decide how to query your data model.

Other Association Attributes The Association attribute can also have other parameters. Name corresponds to the foreign key constraint name, Unique defines a real one-to-one relationship, and OtherKey can specify the comma-separated value of members that forms the primary key of the related entity.

Graph Consistency Relationships are bidirectional between entities-when an update is made on one side, the other side should be kept synchronized. LINQ to SQL does not automatically manage this kind of synchronization, which has to be done by the class entity implementation. LINQ to SQL offers an implementation pattern that is also used by code-generation tools such as SQLMetal, a tool that is part of the Microsoft .NET 3.5 Software Development Kit (SDK), which will be described later in this chapter. This pattern is based on the EntitySet<T> class on one side and on the complex setter accessor on the other side. The product documentation offers a detailed explanation of how to implement this pattern if you do not want to rely on tools-generated code.

Change Notification You will see in the “Data Update” section that LINQ to SQL is able to track changes in entities, submitting equivalent changes to the database. This process is implemented by default through an algorithm that compares an object’s content with its original values, requiring a copy of each tracked object. The memory consumption can be high, but it can be optimized if entities participate in the change tracking service by announcing when an object has been changed.

The implementation requires an entity to expose all its data through properties, and each property setter needs to call the PropertyChanging method of DataContext. Entities that implement this pattern should also implement the System.Data.Linq.INotifyPropertyChanging interface. Further details are available in the product documentation. Tools-generated code for entities (such as that emitted by SQLMetal) already implements this pattern.

Note

The System.Data.Linq.INotifyPropertyChanging interface should not be confused with the System.ComponentModel.INotifyPropertyChanged interface that is used in frameworks such as Windows Forms and Windows Presentation Foundation (WPF). These interfaces can work in conjunction with each other when performing data binding in a User Interface layer.

Relational Model vs. Hierarchical Model

The entity model used by LINQ to SQL defines a set of objects that maps the database tables into objects that can be used and manipulated by LINQ queries. The resulting model makes a paradigm shift that has been revealed in describing associations between entities. We moved from a relational model (tables in a database) to a hierarchical or graph model (objects in memory).

A hierarchical/graph model is the natural way to manipulate objects in a program written in C# or Microsoft Visual Basic. When you try to consider how to translate an existing SQL query into a LINQ query, this is the major conceptual obstacle you encounter. In LINQ, you can write a query using joins between separate entities, just as you do in SQL. However, you can also write a query leveraging the existing relationships between entities, as we did with EntitySet and EntityRef associations.

Important

Remember that SQL does not make use of relationships between entities when querying data. Those relationships exist only to define the data integrity conditions. LINQ does not have the concept of referential integrity, but it makes use of relationships to define possible navigation paths into the data.

Data Querying

A LINQ to SQL query is sent to the database only when the program needs to read data. For example, the following foreach loop iterates rows returned from a table:

 var query =     from    c in Customers     where   c.Country == "USA"     select  c.CompanyName;  foreach( var company in query ) {     Console.WriteLine( company ); }

The code generated by the foreach statement is equivalent to the code shown next. The exact moment the query is executed corresponds to the call of GetEnumerator:

 // GetEnumerator sends the query to the database IEnumerator<string> enumerator = query.GetEnumerator(); while (enumerator.MoveNext()) {     Console.WriteLine( enumerator.Current ); }

Writing more foreach loops of the same query generates as many calls to GetEnumerator, and thus an equal number of repeated executions of the same query. If you want to iterate the same data many times, you might prefer to cache data in memory. Using ToList or ToArray, you convert the results of a query into a List or an Array, respectively. When you call these methods, the SQL query is sent to the database:

 // ToList() sends the query to the database var companyNames = query.ToList();

You might want to send the query to the database several times when you manipulate the LINQ query between data iterations. For example, you might have an interactive user interface that allows the user to add a new filter condition for each iteration of data. In Listing 5-11, the DisplayTop method shows only the first few rows of the result; query manipulation between DisplayTop calls simulates a user interaction that ends in a new filter condition each time.

Listing 5-11: Query manipulation

  static void QueryManipulation() {     DataContext db = new DataContext( ConnectionString );     Table<Customer> Customers = db.GetTable<Customer>();     db.Log = Console.Out;     // All Customers     var query =         from    c in Customers         select  new {c.CompanyName, c.State, c.Country };     DisplayTop( query, 10 );     // User interaction add a filter     // to the previous query     // Customers from USA     query =         from   c in query         where  c.Country == "USA"         select c;     DisplayTop( query, 10 );     // User interaction add another     // filter to the previous query     // Customers from WA, USA     query =         from   c in query         where  c.State == "WA"         select c;     DisplayTop( query, 10 ); } static void DisplayTop<T>( IQueryable<T> query, int rows ) {     foreach( var row in query.Take(rows)) {         Console.WriteLine( row );     } }

Important

In the previous example, we used IQueryable<T> as the DisplayTop parameter. If you pass IEnumerable<T> instead, the results would appear identical, but the query sent to the database would not contain the TOP (rows) clause to filter data directly on the database. When using IEnumerable<T>, you use a different set of extension methods to resolve the Take operator without generating a new expression tree.

A common query used for accessing a database is the read of a single row from a table, defining a condition that is guaranteed to be unique, such as a record key. Here is a typical query:

 var query =     from    c in db.Customers     where   c.CustomerID == "ANATR"     select  c; var enumerator = query.GetEnumerator(); if (enumerator.MoveNext()) {     var customer = enumerator.Current;     Console.WriteLine( "{0} {1}", customer.CustomerID, customer.CompanyName ); }

In this case, it might be shorter and more explicit to state your intention by using the Single operator. The previous query can be written in this more compact way:

 var customer = db.Customers.Single( c => c.CustomerID == "ANATR" ); Console.WriteLine( "{0} {1}", customer.CustomerID, customer.CompanyName );

Projections

The transformation from an expression tree to an SQL query requires the complete understanding of the query operations sent to the LINQ to SQL engine. This transformation affects the use of object initializers. You can use projections through the select keyword, as in the following example:

 var query =     from    c in Customers     where   c.Country == "USA"     select  new {c.CustomerID, Name = c.CompanyName.ToUpper()} into r     orderby r.Name     select  r;

The whole LINQ query is translated into this SQL statement:

 SELECT [t1].[CustomerID], [t1].[value] AS [Name] FROM ( SELECT [t0].[CustomerID],               UPPER([t0].[CompanyName]) AS [value],               [t0].[Country]        FROM [Customers] AS [t0]      ) AS [t1] WHERE    [t1].[Country] = "USA" ORDER BY [t1].[value]

As you can see, the ToUpper method has been translated into an UPPER T-SQL function call. To do that, the LINQ to SQL engine needs a deep knowledge of the meaning of any operation in the expression tree. Consider this query:

 var queryBad =     from    c in Customers     where   c.Country == "USA"     select  new CustomerData( c.CustomerID, c.CompanyName.ToUpper()) into r     orderby r.Name     select  r;

In this case, we call a constructor of the CustomerData type that can do anything a piece of Intermediate Language (IL) code can do. In other words, there is no semantic value in calling a constructor other than the initial assignment of the instance created. The consequence is that LINQ to SQL cannot correctly translate this syntax into an equivalent SQL code, and it throws an exception if you try to execute the query. However, you can safely use a parameterized constructor in the final projection of a query, as in the following sample:

 var queryParamConstructor =     from    c in Customers     where   c.Country == "USA"     orderby c.CompanyName     select  new CustomerData( c.CustomerID, c.CompanyName.ToUpper() );

If you only need to initialize an object, use the object initializers instead of a parameterized constructor call, as in following query:

 var queryGood =     from    c in Customers     where   c.Country == "USA"     select  new CustomerData { CustomerID = c.CustomerID,                                Name = c.CompanyName.ToUpper() } into r     orderby r.Name     select  r;

Important

Always use object initializers to encode projections in LINQ to SQL. Use parameterized constructors only in the final projection of a query.

Stored Procedures

Accessing data through stored procedures and user-defined functions requires the definition of corresponding methods decorated with attributes. This enables you to write LINQ queries in a strongly typed form. From the LINQ point of view, there is no difference if a stored procedure or user-defined function is written in T-SQL or SQLCLR.

Consider the Customers by City stored procedure:

 CREATE PROCEDURE [dbo].[Customers By City]( @param1 NVARCHAR(20) ) AS BEGIN     SET NOCOUNT ON;     SELECT CustomerID, ContactName, CompanyName, City     FROM   Customers AS c     WHERE  c.City = @param1 END

You can define a method decorated with a StoredProcedure attribute that calls the stored procedure through the ExecuteMethodCall<T> method of the DataContext class. In Listing 5-12, we define CustomersByCity as a member of a class derived from DataContext.

Listing 5-12: Stored procedure declaration

  class SampleDb : DataContext {     // ...     [StoredProcedure( Name = "dbo.[Customers By City]" )]     public IEnumerable<CustomerInfo> CustomersByCity( string param1 ) {         return (IEnumerable<CustomerInfo>)             this.ExecuteMethodCall<CustomerInfo>(                 this,                 ((MethodInfo) (MethodInfo.GetCurrentMethod())),                 param1 );     } }

The returned type implements IEnumerable<CustomerInfo> and can be enumerated in a foreach statement like this one:

 SampleDb db = new SampleDb( ConnectionString ); foreach( var row in db.CustomersByCity( "Seattle" )) {     Console.WriteLine( "{0} {1}", row.CustomerID, row.CompanyName ); }

You will find many more details on stored procedure declarations in the product documentation. You can have output parameters, integer results, and multiple resultsets. You always need to know the metadata of all possible returned resultsets. Whenever you have multiple resultsets from a stored procedure, you will use the IMultipleResults return type, calling one GetResult<T> method for each resultset sequentially and specifying the right T type for the expected result. Consider the following stored procedure that returns two resultsets with different structures:

 CREATE PROCEDURE TwoCustomerGroups AS BEGIN     SELECT  CustomerID, ContactName, CompanyName, City     FROM   Customers AS c     WHERE  c.City = 'London'     SELECT  CustomerID, CompanyName, City     FROM   Customers AS c     WHERE  c.City = 'Torino' END

The declaration of the LINQ counterpart should be like the one shown in Listing 5-13.

Listing 5-13: Stored procedure with multiple results

  class SampleDb : DataContext {     // ...     [StoredProcedure(Name="TwoCustomerGroups")]     public IMultipleResults TwoCustomerGroups() {         return (IMultipleResults)             this.ExecuteMethodCallWithMultipleResults(                 this,                 (MethodInfo) (MethodInfo.GetCurrentMethod()) );     } }

Each resultset has a different type. When calling each GetResult<T>, you need to specify the right type, which needs at least a public member with the same name of each returned column. If you specify a type with more public members than available columns, these “missing” members will have a default value. In the next sample, the first resultset must match the CustomerInfo type, while the second resultset must correspond to the CustomerShortInfo type:

 IMultipleResults results = db.TwoCustomerGroups(); foreach( var row in results.GetResult<CustomerInfo>()){         // Access to CustomerInfo instance     }     foreach( var row in results.GetResult<CustomerShortInfo>()) {         // Access to CustomerShortInfo instance     }

The declaration for CustomerInfo and CustomerShortInfo does not require any attribute. You can also use inheritance if the same structure is present in different resultsets, but remember that in this case the use is driven by GetResult usage and not by the InheritanceMapping attribute that you have seen in entity class declaration:

 public class CustomerShortInfo {     public string CustomerID;     public string CompanyName;     public string City; } public class CustomerInfo : CustomerShortInfo{     public string ContactName; }

Important

The fact that you can do something does not imply that it is the right thing to do in any case. The use of inheritance to differentiate the result type of a stored procedure has a demonstrative purpose-to illustrate how the mapping works. You can also use a single type containing all possible columns returned from all resultsets. It is your responsibility to choose the right way.

User-Defined Functions

A user-defined function (UDF) needs the same kind of declaration you have just seen for stored procedures. A UDF can be used inside a LINQ query; thus, it must be considered from the LINQ to SQL engine in the SQL statement construction. Remember that there is no difference if a UDF is written in T-SQL or SQLCLR.

Listing 5-14 provides an example of a LINQ declaration of the scalar-valued UDF MinUnitPriceByCategory that is defined in the sample Northwind database.

Listing 5-14: Scalar-valued user-defined function

  class SampleDb : DataContext {     // ...     [Function(Name="dbo.MinUnitPriceByCategory")]     public decimal MinUnitPriceByCategory( int categoryID ) {         IExecuteResults mc =             this.ExecuteMethodCall(                 this,                 (MethodInfo) (MethodInfo.GetCurrentMethod()),                 categoryID );         return (decimal)mc.ReturnValue;     } }

The call of a UDF as an isolated expression generates a single SQL query invocation. You can also use a UDF in a LINQ query like the following one:

 var query =     from   c in Categories     select new { c.CategoryID,                  c.CategoryName,                  MinPrice = db.MinUnitPriceByCategory( c.CategoryID )};

The generated SQL query looks like this:

 SELECT [t0].[CategoryID],        [t0].[CategoryName],        dbo.MinUnitPriceByCategory([t0].[CategoryID]) AS [value] FROM   [Categories] AS [t0]

A table-valued UDF has a different declaration that must match the returned type, as demonstrated in Listing 5-15.

Listing 5-15: Table-valued user-defined function

  class SampleDb : DataContext {     // ...     [Function(Name="dbo.CustomersByCountry")]     public IQueryable<Customer> CustomersByCountry( string country ) {         return (IQueryable<Customer>)             this.ExecuteMethodCall<Customer>(                 this,                 (MethodInfo) MethodInfo.GetCurrentMethod(),                 country );     } }

A table-valued UDF can be used like any other table in a LINQ query. For example, you can join customers returned by the previous UDF with the orders made by them, as in the following query:

 Table<Order> Orders = db.GetTable<Order>(); var queryCustomers =     from   c in db.CustomersByCountry( "USA" )     join   o in Orders            on c.CustomerID equals o.CustomerID            into orders     select new { c.CustomerID, c.CompanyName, orders };

The generated SQL query will be similar to this one:

 SELECT [t0].[CustomerID], [t0].[CompanyName],        [t0].[City], [t0].[Region], [t0].[Country] FROM   dbo.CustomersByCountry('USA') AS [t0]

Compiled Queries

If you need to repeat the same query many times, eventually with different argument values, you might be worried about the multiple query construction. Several databases, such as SQL Server, try to auto-parameterize received SQL queries to optimize the compilation of the query execution plan. However, the program that sends a parameterized query to SQL Server will get better performance, because SQL Server does not spend time to analyze it if the query is similar to another one already processed. LINQ already does a fine job of query optimization, but each time that the same query tree is evaluated, the LINQ to SQL engine parses the query tree to build the equivalent SQL code. You can optimize this behavior by using the CompiledQuery class.

More Info

LINQ providers are in charge of creating optimized queries. For example, the built-in SQL Server provider sends parameterized queries to the database. Every time you see a constant value in the SQL code presented in this chapter, keep in mind that the real SQL query sent to the database has a parameter for each constant in the query. That constant can be the result of an expression that is independent from the query execution. This kind of expression is resolved by the host language (C# in this case). When you use the CompiledQuery class, the parsing of the query tree and the creation of the equivalent SQL code is the operation that is not repeated every time LINQ has to process the same query.

To compile a query, you can use the CompiledQuery.Compile static method. This approach passes the LINQ query as a parameter in the form of a lambda expression, and then obtains a delegate with arguments corresponding to both the DataContext on which you want to operate and the parameters of the query. Listing 5-16 illustrates compiled query declaration and use.

Listing 5-16: Compiled query in a local scope

  static void CompiledQueriesLocal() {     DataContext db = new DataContext( ConnectionString );     Table<Customer> Customers = db.GetTable<Customer>();     var query =         CompiledQuery.Compile(             ( DataContext context, string filterCountry ) =>                 from c in Customers                 where c.Country == filterCountry                 select new { c.CustomerID, c.CompanyName, c.City } );     foreach (var row in query( db, "USA" )) {         Console.WriteLine( row );     }     foreach (var row in query( db, "Italy" )){         Console.WriteLine( row );     } }

As you can see in the preceding example, the Compile method requires a lambda expression whose first argument is a DataContext instance. That argument defines the connection over which the query will be executed. Assigning the CompiledQuery.Compile result to a local variable is easy (because you declare that variable with var) but probably not very frequent. Chances are that you need to store the delegate returned from CompiledQuery.Compile in an instance or a static member to easily reuse it several times. To do that, you need to know the right declaration syntax.

A compiled query is stored in a Func delegate, where the first argument must be an instance of DataContext (or a derived class) and the last argument must be the type returned from the query. You can define other arguments in the middle that will be arguments of the compiled query and will need to be specified for each compiled query invocation. Listing 5-17 shows the syntax you can use in this scenario to create the compiled query and then use it.

Listing 5-17: Compiled query assigned to a static member

  public static Func< nwind.Northwind, string, IQueryable<nwind.Customer>>     CustomerByCountry =         CompiledQuery.Compile(             ( nwind.Northwind db, string filterCountry ) =>                 from c in db.Customers                 where c.Country == filterCountry                 select c ); static void CompiledQueriesStatic() {     nwind.Northwind db = new nwind.Northwind( ConnectionString );     foreach (var row in CustomerByCountry( db, "USA" )) {         Console.WriteLine( row.CustomerID );     }     foreach (var row in CustomerByCountry( db, "Italy" )) {         Console.WriteLine( row.CustomerID );     } }

Different Approaches to Querying Data

When using LINQ to SQL entities, you have two approaches to querying the same data. The classical way to navigate a relational schema is to write associative queries, just as you can do in SQL. The alternative way offered by LINQ to SQL is through graph traversal. Given the same result, we might obtain different SQL queries and a different level of performance.

Consider this SQL query that calculates the total quantity of orders for a product (in this case, Chocolade, which is a localized name in the Northwind database):

 SELECT    SUM( od.Quantity ) AS TotalQuantity FROM      [Products] p LEFT JOIN [Order Details] od      ON   od.[ProductID] = p.[ProductID] WHERE     p.ProductName = 'Chocolade' GROUP BY  p.ProductName

The natural conversion into a LINQ query is shown in Listing 5-18. The Single operator gets the first row and puts it into quantityJoin, which is used to display the result.

Listing 5-18: Query with Join

  var queryJoin =     from   p in db.Products     join   o in db.Order_Details            on p.ProductID equals o.ProductID            into OrdersProduct     where  p.ProductName == "Chocolade"     select OrdersProduct.Sum( o => o.Quantity ); var quantityJoin = queryJoin.Single(); Console.WriteLine( quantityJoin );

As you can see, the associative query in LINQ can explicitly require the join between Products and Order_Details through ProductID equivalency. By leveraging entities, you can implicitly use the relationship between Products and Order_Details defined in the Product class, as shown in Listing 5-19.

Listing 5-19: Query using Association

  var queryAssociation =     from   p in db.Products     where  p.ProductName == "Chocolade"     select p.Order_Details.Sum( o => o.Quantity ); var quantityAssociation = queryAssociation.Single(); Console.WriteLine( quantityAssociation );

The single SQL queries produced by both of these LINQ queries are identical. The LINQ query with join is more explicit about the access to data, while the query that uses the association between Product and Order_Details is more implicit in this regard. Using implicit associations results in shorter queries that are less error-prone (because you cannot be wrong about the join condition). At first, you might find that a shorter query is not easier to read. However, this perception might arise because you are accustomed to seeing lengthier queries, and your comfort level with shorter ones could change over time.

Looking further, you can observe that reading a single product does not require a query expression. You can apply the Single operator directly on the Products table, as shown in Listing 5-20.

Listing 5-20: Access through Entity

  var chocolade = db.Products.Single( p => p.ProductName == "Chocolade" ); var quantityValue = chocolade.Order_Details.Sum( o => o.Quantity ); Console.WriteLine( quantityValue );

This is a two-step operation that sends two SQL queries to the database. The first one retrieves the Product entity, and the second one accesses the Order Details table to calculate the total quantity for the required product. At first glance, this kind of access is shorter to write compared to a query, but its performance is worse. This conclusion is true if you consider the single operation isolated from any other aspect. It might be false in a real-world scenario.

The unique identity management of LINQ to SQL grants that a single instance of an entity exists in memory. If you know that a Product entity is already in memory for Chocolade when you need to calculate the total quantity of orders, or that the same Product entity will be used later for other purposes, this double access can be balanced by fewer instances of accessing the whole program. The previous queries did not create Product instances because only the total for the product was required as output. From this point of view, if we already had a Product instance for Chocolade in memory, the performance of queries would be worse because they make a useless join with Product just to transform the product name (Chocolade) into its corresponding ProductID.

A final thought on the number of generated queries: You might think that we generated two queries when accessing data through the Product entity because we had two distinct statements-one to assign the chocolade variable, and the other to assign a value to quantityEntity. This assumption is not completely true. Even if you write a single statement, the use of a Product entity (the results from the Single operator call) generates a separate query. Listing 5-21 produces the same results (in terms of memory objects and SQL queries) as Listing 5-20.

Listing 5-21: Access through Entity with a single statement

  var quantityChocolade = db.Products.Single( p => p.ProductName == "Chang" )                         .Order_Details.Sum( o => o.Quantity ); Console.WriteLine( quantityChocolade );

Finding a better way to access data really depends on the whole set of operations performed by a program. If you extensively use entities in your code to store data in memory, access to data through graph traversal based on entity access might offer better performance. On the other hand, if you always transform query results in anonymous types and never manipulate entities in memory, you might prefer an approach based on LINQ queries. As always, the right answer is, “It depends.”

Direct Queries

Sometime you might need access to some database SQL features that are not available with LINQ. For example, imagine that you want to use Common Table Expressions (CTE) or the PIVOT command with SQL Server. LINQ does not have an explicit constructor to do that, even if (in the future) its SQL Server provider could use these features to optimize some queries. Listing 5-22 shows how you can use the ExecuteQuery<T> method of the DataContext class to send a query directly to the database. The T in ExecuteQuery<T> is an entity class that represents a returned row.

Listing 5-22: Direct query

  var query = db.ExecuteQuery<EmployeeInfo>( @"     WITH EmployeeHierarchy (EmployeeID, LastName, FirstName,          ReportsTo, HierarchyLevel) AS     (  SELECT EmployeeID,LastName, FirstName,               ReportsTo, 1 as HierarchyLevel        FROM   Employees        WHERE  ReportsTo IS NULL        UNION ALL        SELECT      e.EmployeeID, e.LastName, e.FirstName,                    e.ReportsTo,eh.HierarchyLevel + 1 AS HierarchyLevel        FROM        Employees e        INNER JOIN  EmployeeHierarchy eh                ON  e.ReportsTo = eh.EmployeeID     )     SELECT *     FROM EmployeeHierarchy     ORDER BY HierarchyLevel, LastName, FirstName" ); foreach (var row in query) {     Console.WriteLine( row ); }

Warning

Columns in the resulting rows that do not match entity attributes are ignored. Entity members that do not have corresponding columns are initialized with the default value. If the EmployeeInfo class would contain a mismatched column name, that member would be not assigned without an error. Be careful and check name correspondence when some column or member values are not filled in the result.

Read-Only DataContext Access

If you need to access data only in a read-only way, you might want to improve performance by disabling a DataContext service that supports data modification:

 DataContext db = new DataContext( ConnectionString ); db.ObjectTracking = false; var query = ...

The ObjectTracking property controls the change tracking service that we will describe in the next section.

Data Update

All entity instances are tracked by the identity management service of LINQ to SQL to keep a unique instance of a row of data. This service is guaranteed only for objects created or handled by DataContext. (This has implications that you will see shortly.) Keeping a single instance of a row of data allows the manipulation of in-memory objects without concern for potential data inconsistencies or duplication in memory. We will analyze how to deal with concurrent operations later.

Important

Remember that a class entity must have at least a column with the IsPrimaryKey=true setting in the Column attribute; otherwise, it cannot be tracked by the identity management service, and data manipulation is not allowed.

Entity Updates

Changing data members and properties of an entity instance is an operation tracked by the change tracking service of LINQ to SQL. This service retains the original value of a modified entity. With this information, the service generates a corresponding list of SQL statements that make the same changes on the database. You can see these SQL statements by calling the GetChangeText method on DataContext:

 var customer = db.Customers.Single( c => c.CustomerID == "FRANS" ); customer.ContactName = "Marco Russo"; Console.WriteLine( db.GetChangeText() );

The output from the previous code is similar to that shown here:

 UPDATE [Customers] SET    [ContactName] = "Marco Russo" FROM   [Customers] WHERE  ...

We will discuss the WHERE condition later. Remember that SQL statements in the list are not sent to the database until the call to the SubmitChanges method is made.

If you want to add a record to a table or remove a record from a table, creating or deleting an object in memory is not enough. The DataContext instance must be notified also. This can be done directly by calling Add or Remove on the corresponding Table collection (these methods operate on the in-memory copy of the data; a subsequent SubmitChanges call will forward the SQL commands to the database):

 var newCustomer = new Customer {                        CustomerID = "DLEAP",                        CompanyName = "DevLeap",                        Country = "Italy" }; db.Customers.Add( newCustomer ); var oldCustomer = db.Customers.Single( c => c.CustomerID == "FRANS" ); db.Customers.Remove( oldCustomer );

Looking at the generated SQL statements, you will see that a single INSERT is generated to add a new customer, while the deletion of a customer might generate many UPDATE statements before the DELETE one. This is because of the need to “break” the relationship between a customer and its orders. Related orders are not deleted, but simply “disconnected” from the customer that was tied to them.

 INSERT INTO [Customers](CustomerID, CompanyName, ...) VALUES("DEVLEAP", "DevLeap", ...) UPDATE [Orders] SET    [CustomerID] = NULL FROM   [Orders] WHERE  ([OrderID] = @p1) AND ... -- Other UPDATE statements here, one for each Order of the removed customer DELETE FROM [Customers] WHERE [CustomerID] = "FRANS"

Note

Calling Add or Remove several times for the same object (entities have a unique identity) will not generate the same SQL statement multiple times.

Another way to notify the DataContext of a new entity is to attach the new entity to an existing object already tracked by DataContext:

 var newCustomer = new Customer {                        CustomerID = "DLEAP",                        CompanyName = "DevLeap",                        Country = "Italy" }; var order = db.Orders.Single( o => o.OrderID == 10248 ); order.Customer = newCustomer;

The examples just shown introduced the need to understand how relationships between entities work when updates are applied to the database. Relationships are bidirectional between entities, and when an update is made on one side, the other side should be kept synchronized. This has to be done by the class entity implementation. Entity classes generated by code-generation tools (such as SQLMetal) usually offer this level of service.

The previous operation inserted a customer tied to order 10248. If you explore the newCustomer entity after the order.Customer assignment, you will see that its Orders properties contain order 10248. Executing the following code will display one row containing the order 10248:

 foreach( var o in newCustomer.Orders ) {     Console.WriteLine( "{0}-{1}", o.CustomerID, o.OrderID ); }

You can work in the opposite way, assigning an order to the Orders properties of a customer. Consequently, the Customer property of the moved order will be updated.

 var oldCustomer = db.Customers.Single( c => c.CustomerID == "VINET" ); var newCustomer = new Customer {                        CustomerID = "DLEAP",                        CompanyName = "DevLeap",                        Country = "Italy" }; db.Customers.Add( newCustomer ); var order = oldCustomer.Orders.Single( o => o.OrderID == 10248 ); oldCustomer.Orders.Remove( order ); newCustomer.Orders.Add( order );

You have seen that there are two ways to add a record to a table (one direct and one indirect). However, if you need to remove a row, you always have to do this in a direct way, calling the Remove method on the corresponding Table collection. When you remove an object, related entities are unbound (that is, the foreign key is set to NULL), but this might throw an exception if constraints do not allow NULL values. If you also want to remove the “child” objects of a removed object, you have to call the Remove method on them. You can do that by leveraging the RemoveAll method:

 var order = db.Orders.Single( o => o.OrderID == 10248 ); db.Order_Details.RemoveAll( order.Order_Details ); db.Orders.Remove( order );

This update at the moment of calling SubmitChanges will generate SQL statements that respect the referential integrity constraints shown in the following statements:

 DELETE FROM [Order Details] WHERE ([OrderID] = 10248) AND ([ProductID] = 11) DELETE FROM [Order Details] WHERE ([OrderID] = 10248) AND ([ProductID] = 42) DELETE FROM [Order Details] WHERE ([OrderID] = 10248) AND ([ProductID] = 72) DELETE FROM [Orders] WHERE [OrderID] = 10248

After a call to SubmitChanges, all tracked changes history is thrown away.

Important

As we have already seen, you can disable the change tracking service for a DataContext by specifying false on its ObjectTracking property. Whenever you need to get data only in a read-only way-for example, to display a report or a Web page in a noninteractive mode-this setting will improve overall performance.

Database Updates

With LINQ to SQL, many SQL queries are sent to the database in a transparent and implicit way. On the other hand, all SQL commands that modify the state of the database are sent only when you decide to do that, calling SubmitChanges on the DataContext object (which is eventually derived), as shown in Listing 5-23.

Listing 5-23: Submit changes to the database

  Northwind db = new Northwind( Program.ConnectionString ); var customer = db.Customers.Single( c => c.CustomerID == "FRANS" ); customer.ContactName = "Marco Russo"; db.SubmitChanges();

Concurrent Operations

Operating with in-memory entities in LINQ is a form of disconnected operation on data. In these cases, you always have to deal with concurrent operations made by other users or connections between the read of data and its successive updates. Usually, you operate with optimistic concurrency. In the case of a conflict, a ChangeConflictException error is thrown by default. This exception contains a Conflicts collection that explains the reasons for the error. (There can be several conflicts on different tables in a single SubmitChanges call.) Listing 5-24 provides a demonstration.

Listing 5-24: Retry loop for a concurrency conflict

  Northwind db2 = new Northwind( Program.ConnectionString ); for( int retry = 0; retry < 4; retry++ ) {     var customer2 = db2.Customers.Single( c => c.CustomerID == "FRANS" );     // Another connection updates database table here     customer2.ContactName = "Paolo Pialorsi";     try {         db2.SubmitChanges(); // May throw exception         break;               // Exit the "for" loop if submit succeed     }     catch (ChangeConflictException ex) {         Console.WriteLine( ex.Message );         db2.Refresh( customer2, RefreshMode.KeepChanges );     } }

Important

After a conflict, you might decide to re-read all the data or rely on the Refresh method, as demonstrated in the previous code sample. The RefreshMode.KeepChanges argument keeps the data found in the concurrent update if it was unchanged in the updated entity. Other RefreshMode values are KeepCurrentValues and OverwriteCurrentValues, which specify different behaviors. See the product documentation for a detailed explanation.

SubmitChanges can have a parameter specifying whether you want to stop at the first conflict or try all updates regardless of the conflict. The default is to stop at the first conflict:

 db.SubmitChanges(ConflictMode.FailOnFirstConflict); db.SubmitChanges(ConflictMode.ContinueOnConflict);

You can control how the concurrency conflict is determined through entity class definition. Each Column attribute can have an UpdateCheck argument that can have one of the following three values:

Always Always use this column (which is the default) for conflict detection.
Never Never use this column for conflict detection.
WhenChanged Use this column only when the member has been changed by the application.

Other options in column definitions are represented by two Boolean flags: IsDBGenerated identifies that the value is auto-generated by the database, and IsVersion identifies a database timestamp or a version number. If a column has IsVersion set to true, the concurrency conflict is identified and only the entity unique key and its timestamp/version column are compared.

Note

Using IsVersion simplifies the query sent to the database to check concurrency conflict-updates and deletes can have a long WHERE condition if an IsVersion column is not specified.

IsDBGenerated and IsVersion require a SELECT to be submitted after the UPDATE or INSERT operation. The tradeoff between having an IsVersion column or not depends on the number and complexity of table columns.

Transactions

A SubmitChanges call automatically starts a database explicit transaction, using IDbConnection .BeginTransaction and applying all changes made in memory to the database inside the same transaction. Using the TransactionScope class contained in the System.Transactions library since .NET 2.0, you can add any standard command to the database or change any other transactional resource in the same transaction, which eventually will be transparently promoted to a distributed transaction. Listing 5-25 is an example of a transaction controlled in this way.

Listing 5-25: Transaction controlled by TransactionScope

  using(TransactionScope ts = new TransactionScope()) {     Product prod = db.Products.Single(p => p.ProductID == 42);     if (prod.UnitsInStock > 0) {         prod.UnitsInStock--;     }     db.SubmitChanges();     ts.Complete(); }

In the case of an exception, the database transaction is canceled. If you have an existing ADO.NET application that does not use System.Transactions, you can control database transactions by accessing the Transaction property of DataContext.

Stored Procedures

You can override the default insert, update, and delete SQL statements generated by LINQ to SQL when submitting changes. To do that, you can define one or more methods with specific signatures and pattern names. This is the syntax to use-note that you need to replace the name of the modified type to TYPE:

 public void UpdateTYPE(TYPE original, TYPE current) { ... } public void InsertTYPE(TYPE inserted) { ... } public void DeleteTYPE(TYPE deleted) { ... }

Important

The name of the method is important. The LINQ to SQL engine looks for a method with a matching signature and that has a name that starts with the word corresponding to the operation you override (Update, Insert, or Delete) followed by the name of the modified type.

Usually, this particular override is used to call stored procedures instead of sending SQL statements to execute data manipulation on the database. These methods have to be defined on the DataContext derived class. Because a derived class is already generated by some tool (such as SQLMetal or the LINQ to SQL designer in Microsoft Visual Studio), you can add your methods by using the partial class syntax, as shown in Listing 5-26.

Listing 5-26: Stored procedure to override an update

  public partial class Northwind : DataContext {     public void UpdateProduct(Product original, Product current) {         // Execute the stored procedure for UnitsInStock update         if (original.UnitsInStock != current.UnitsInStock) {             int rowCount = this.ExecuteCommand(                             "exec UpdateProductStock " +                             "@id={0}, @originalUnits={1}, @decrement={2}",                             original.ProductID,                             original.UnitsInStock,                             (original.UnitsInStock - current.UnitsInStock) );             if (rowCount < 1) {                 throw new OptimisticConcurrencyException();             }         }     } }

Important

Conflict detection is your responsibility if you decide to override insert, update, and delete methods.

Binding Metadata

The mapping between LINQ to SQL entities and database structures has to be described through metadata information. Until now, you have seen attributes on entity definition fulfilling this rule. There is an alternative way to do this (using external XML mapping file), and there are tools and methods that automate the generation of entity classes starting from a database and vice versa.

Creating a Database from Entities

An application that auto-installs itself on a machine might be interested in creating a database that can persist its objects graph. This is a typical situation-you need to handle simple configurations that are represented by a graph of objects.

If you have a class derived from DataContext that contains entity definitions decorated with Table and Column attributes, you can create the corresponding database by calling the CreateDatabase method. This method sends the necessary CREATE DATABASE statement as well as the subsequent CREATE TABLE and ALTER TABLE statements:

 const string ConnectionString =     "Database=Test_Northwind;Trusted_Connection=yes"; public static void Create() {     Northwind db = new Northwind( ConnectionString );     db.CreateDatabase(); }

You can also drop a database and check for its existence. The name of the database is inferred from the connection string. You can duplicate a database schema into several databases simply by changing the connection string:

 if (db.DatabaseExists()) {     db.DeleteDatabase();   // Send a DROP DATABASE } db.CreateDatabase();

Creating Entities from a Database

If you already have an existing physical data layer, you might want to create a set of entity classes for an existing database. You can use two available tools: SQLMetal and the LINQ to SQL designer integrated in Visual Studio.

SQLMetal is a command-line tool. It generates a source file containing entity class declarations based on metadata read from the specified database. This process can be split into two steps: First, an XML file describing the database structure is created. That XML file can be edited by adding or removing desired entities, constraints, or both. At the end, the XML file is used to generate source code files for entity classes. The following commands create the XML file from database metadata and the source code from the XML file, respectively:

 SqlMetal /server:localhost /database:Northwind /pluralize /xml:Northwind.xml SqlMetal /namespace:northwind /code:Northwind.cs Northwind.xml

See the product documentation for details about SQLMetal and the related XML file. This same XML file can be used as a basis for the LINQ external mapping feature.

The LINQ to SQL designer integrated in Visual Studio allows the creation of LINQ entity classes in a more interactive way-you have a graphical editor that allows the design of a graphical schema of entities. To open the designer, use the Project / Add New Item command in Visual Studio 2005 and choose the Linq to SQL File template.

External Mapping

An XML file similar to the one created by the SQLMetal utility can be used as a mapping file to decorate entity classes instead of using attributes. The additions to the automatically generated file are a Type element and a Member attribute that bind the entity classes and columns to their respective mapping information. An XML mapping file will appear that looks like the following code sample:

 <Database Name="Northwind">    <Table Name="Products">        <Type Name=" Product">            <Column Name="ProductID" Member="ProductID"                    Storage="_ProductID" DbType="Int NOT NULL IDENTITY"                    IsPrimaryKey="True" IsDbGenerated="True" />

The XML file can be loaded using an XmlMappingSource instance generated by calling its FromXml static method. The following example shows how to use such syntax:

 string path = "Northwind.xml"; XmlMappingSource prodMapping =         XmlMappingSource.FromXml(File.ReadAllText(path)); Northwind db = new Northwind(         @"Database=Test_Northwind;Trusted_Connection=yes",         prodMapping     );

One possible use of this technique is a scenario in which different databases must be mapped to a specific data model. Differences in databases might be table and field names (for example, localized versions of the database). In general, consider this option when you need to realize “light” decoupling of mapping between entity classes and the physical data structure of the database.

Differences Between .NET and SQL Type Systems

The product documentation illustrates all types of system differences between the .NET Framework and LINQ to SQL. Many operators require a specific conversion, such as cast operations and the ToString method that are converted in CAST or CONVERT operators in SQL translation. There could be significant differences if your code is sensitive to rounding differences. (Math.Round and ROUND have different logic. See the MidpointRounding enumeration used to control the behavior.) There are also minor differences in date and time manipulation. Above all, you need to remember that SQL Server supports DATETIME but not DATE. See the documentation for further details.