What Is LINQ? | Introducing MicrosoftВ® LINQ

LINQ is a programming model that introduces queries as a first-class concept into any Microsoft .NET language. However, complete support for LINQ requires some extensions in the language used. These extensions boost productivity, thereby providing a shorter, meaningful, and expressive syntax to manipulate data.

Following is a simple LINQ query for a typical software solution that returns the names of customers in Italy:

 var query =     from   c in Customers     where  c.Country == "Italy"     select c.CompanyName;

Do not worry about syntax and keywords (such as var) for now. The result of this query is a list of strings. You can enumerate these values with a foreach loop in C#:

 foreach ( string name in query ) {     Console.WriteLine( name ); }

Both the query definition and the foreach loop just shown are regular C# 3.0 statements. At this point, you might wonder what we are querying. What is Customers? Is this query a new form of embedded SQL? Not at all. The same query (and the following foreach code) can be applied to an SQL database, to a DataSet, to an array of objects in memory, or to many other kinds of data. Customers could be a collection of objects:

 Customer[] Customers;

Customers could be a DataTable in a DataSet:

 DataSet ds = GetDataSet(); DataTable Customers = ds.Tables["Customers"];

Customers could be an entity class that describes a physical table in a relational database:

 DataContext db = new DataContext( ConnectionString ); Table<Customer> Customers = db.GetTable<Customer>();

Finally, Customers could be an entity class that describes a conceptual model and is mapped to a relational database:

 NorthwindModel dataModel = new NorthwindModel(); ObjectQuery<Customer> Customers = dataModel.Customers;

As you will see, the SQL-like syntax used in LINQ is called a query expression. Languages that implement embedded SQL define only a simplified syntax to put SQL statements into a different language, but these statements are not integrated into the language’s native syntax and type system. For example, you cannot call a function written using the host language in the middle of an SQL statement, although this is possible in LINQ. Moreover, LINQ is not limited to querying databases, as embedded SQL is.

How LINQ Works

Assuming that you understood the concept of having syntax to integrate queries into a language, you may want to see how this works. When you write the following code:

 Customer[] Customers = GetCustomers(); var query =     from   c in Customers     where  c.Country == "Italy"     select c;

the compiler generates this code:

 Customer[] Customers = GetCustomers(); IEnumerable<Customer> query =         Customers         .Where( c => c.Country == "Italy" );

From now on, we will skip the Customers declaration for the sake of brevity. When the query becomes longer, as you see here:

 var query =     from    c in Customers     where   c.Country == "Italy"     orderby c.Name     select  new { c.Name, c.City };

the generated code is longer too:

 var query =         Customers         .Where( c => c.Country == "Italy" );         .OrderBy( c => c.Name )         .Select( c => new { c.Name, c.City } );

As you can see, the code apparently calls instance members on the object returned from the previous call. You will see that this apparent behavior is regulated by the extension methods feature of the host language (C# in this case). The implementation of the Where, OrderBy, and Select methods-called by the sample query-depends on the type of Customers and on namespaces specified in previous using statements. Extension methods are a fundamental syntax feature that is used by LINQ to operate with different data domains using the same syntax.

More Info

An extension method appears to extend a class (the class of Customers in our examples), but in reality an external method receives the instance of the class that seems to be extended as the first argument. The var keyword used to declare query infers the variable type declaration from the initial assignment, which in this case will return an IEnumerable<T> type. Further descriptions of these and other language extensions are contained in Chapter 2, “C# Language Features,” and Chapter 3, “Visual Basic 9.0 Language Features.”

Another important concept is the timing of operations over data. In general, a LINQ query is not really executed until there is access to the query result, because it describes a set of operations that will be performed when necessary. The access to a query result does the real work. This can be illustrated in the case of a foreach loop:

 var query = from c in Customers ... foreach ( string name in query ) ...

There are also methods that iterate a LINQ query result, producing a persistent copy of data in memory. For example, the ToList method produces a typed List<T> collection:

 var query = from c in Customers ... List<Customer> customers = query.ToList();

When the LINQ query operates on data that is on a relational database (such as Microsoft SQL Server), the LINQ query generates an equivalent SQL statement instead of operating with in-memory copies of data tables. The query execution on the database is delayed until the first access to the query result. Therefore, if in the last two examples Customers was a Table<Customer> type (a physical table in a relational database) or an ObjectQuery<Customer> type (a conceptual entity mapped to a relational database), the equivalent SQL query would not be sent to the database until the foreach loop was executed or the ToList method was called. The LINQ query can be manipulated and composed in different ways until that time.

Relational Model vs. Hierarchical/Graph Model

At first sight, LINQ might appear to be just another SQL dialect. This similarity has its roots in the way a LINQ query can describe a relationship between entities such as an SQL join:

 var query =     from   c in Customers     join   o in Orders            on c.CustomerID equals o.CustomerID     select new { c.CustomerID, c.CompanyName, o.OrderID };

This is similar to the regular way of querying data in a relational model. However, LINQ is not limited to a single data domain like the relational model is. In a hierarchical model, suppose that each customer has its own set of orders, and each order has its own list of products. In LINQ, we can get the list of products ordered by each customer in this way:

 var query =     from   c in Customers     from   o in c.Orders     select new { c.Name, o.Quantity, o.Product.ProductName };

The previous query contains no joins. The relationship between Customers and Orders is expressed by the second from clause, which uses c.Orders to say “get all Orders of the c Customer.” The relationship between Orders and Products is expressed by the Product member of the Order instance. The result projects the product name for each order row using o.Product.ProductName.

Hierarchical relationships are expressed in type definitions through references to other objects. To support the previous query, we would have classes similar to those in Listing 1-1.

Listing 1-1: Type declarations with simple relationships

  public class Customer {     public string Name;     public string City;     public Order[] Orders; } public struct Order {     public int Quantity;     public Product Product; } public class Product {     public int IdProduct;     public decimal Price;     public string ProductName; }

However, chances are that we want to use the same Product instance for many different Orders of the same product. We probably also want to filter Orders or Products without accessing them through Customer. A common scenario is the one shown in Listing 1-2.

Listing 1-2: Type declarations with two-way relationships

  public class Customer {     public string Name;     public string City;     public Order[] Orders; } public struct Order {     public int Quantity;     public Product Product;     public Customer Customer; } public class Product {     public int IdProduct;     public decimal Price;     public string ProductName;     public Order[] Orders; }

By having an array of all products declared as follows:

 Product[] products;

we can query the graph of objects, asking for the list of orders for the single product with an ID equal to 3:

 var query =     from   p in products     where  p.IdProduct == 3     from   o in p.Orders     select o;

With the same query language, we are querying different data models. When you do not have a relationship defined between the entities used in the query, you can always rely on subqueries and joins that are available in LINQ syntax just as in an SQL language. However, when your data model already defines entity relationships, you can leverage them, avoiding replication (with possible mistakes) of the same information in many places.

If you have entity relationships in your data model, you can still use explicit relationships in a LINQ query-for example, when you want to force some condition, or when you simply want to relate entities that do not have native relationships. For example, imagine that you want to find customers and suppliers who live in the same city. Your data model might not provide an explicit relationship between these attributes, but you can always write the following:

 var query =     from   c in Customers     join   s in Suppliers            on c.City equals s.City     select new { c.City, c.Name, SupplierName = s.Name };

And something like the following will be returned:

 City=Torino     Name=Marco      SupplierName=Trucker City=Dallas     Name=James      SupplierName=FastDelivery City=Dallas     Name=James      SupplierName=Horizon City=Seattle    Name=Frank      SupplierName=WayFaster

If you have experience using SQL queries, you probably assume that a query result is always a “rectangular” table, one that repeats the data of some columns many times in a join like the previous one. However, often a query contains several entities with one or more one-to-many relationships. With LINQ, you can write queries that return a hierarchy or graph of objects like the following one:

 var query =     from   c in Customers     join   s in Suppliers            on c.City equals s.City            into customerSuppliers     select new { c.City, c.Name, customerSuppliers };

The last query returns a row for each customer, each containing a list of suppliers available in the same city as the customer. This result can be queried again, just as any other object graph with LINQ. Here is how the hierarchized results might appear:

 City=Torino     Name=Marco      customerSuppliers=...   customerSuppliers: Name=Trucker         City=Torino City=Dallas     Name=James      customerSuppliers=...   customerSuppliers: Name=FastDelivery    City=Dallas   customerSuppliers: Name=Horizon         City=Dallas City=Seattle    Name=Frank      customerSuppliers=...   customerSuppliers: Name=WayFaster       City=Seattle

If you want to get a list of customers and provide each customer with the list of products he ordered at least one time and the list of suppliers in the same city, you can write a query like this:

 var query =     from   c in Customers     select new {         c.City,         c.Name,         Products = (from   o in c.Orders                     select new { o.Product.IdProduct,                                  o.Product.Price }).Distinct(),         CustomerSuppliers = from   s in Suppliers                             where  s.City == c.City                             select s };

You can take a look at the result for a couple of customers to understand how data is returned from the previous single LINQ query:

 City=Torino     Name=Marco      Products=...    CustomerSuppliers=...   Products: IdProduct=1   Price=10   Products: IdProduct=3   Price=30   CustomerSuppliers: Name=Trucker         City=Torino City=Dallas     Name=James      Products=...    CustomerSuppliers=...   Products: IdProduct=3   Price=30   CustomerSuppliers: Name=FastDelivery    City=Dallas   CustomerSuppliers: Name=Horizon         City=Dallas

This type of result would be hard to obtain with one or more SQL queries, because it would require an analysis of query results to build the desired objects graph. LINQ offers an easy way to move your data from one model to another and different ways to get the same results.

LINQ requires you to describe your data in terms of entities that are also types in the language. When you build a LINQ query, it is always a set of operations on instances of some classes. These objects might be the real container of data, or they might be a simple description (in terms of metadata) of the external entity you are going to manipulate. A query can be sent to a database through an SQL command only if it is applied to a set of types that map tables and relationships contained in the database. After you have defined entity classes, you can use both approaches we described (joins and entity relationship navigation). The conversion of all these operations in SQL commands is the responsibility of the LINQ engine.

Note

You can create entity classes by using code-generation tools such as SQLMetal or the LINQ to SQL Designer in Microsoft Visual Studio.

In Listing 1-3, you can see an example of a Product class that maps a relational table named Products, with five columns that correspond to public data members.

Listing 1-3: Class declaration mapped on a database table

  [Table("Products")] public class Product {     [Column(IsPrimaryKey=true)] public int IdProduct;     [Column(Name="UnitPrice")] public decimal Price;     [Column()] public string ProductName;     [Column()] public bool Taxable;     [Column()] public decimal Tax; }

When you work on entities that describe external data (such as database tables), you can create instances of these kinds of classes and manipulate in-memory objects just as if data from all tables were loaded in memory. These changes are submitted to the database through SQL commands when you call the SubmitChanges method, as you can see in Listing 1-4.

Listing 1-4: Database update calling the SubmitChanges method

  var taxableProducts =     from   p in db.Products     where  p.Taxable == true     select p; foreach( Product product in taxableProducts ) {     RecalculateTaxes( product ); } db.SubmitChanges();

The Product class in the preceding example represents a row in the Products table of an external database. When SubmitChanges is called, all changed objects generate an SQL command to update the corresponding rows in the table.

More Info

Class entities that match tables and relationships in the database are further described in Chapter 5, “LINQ to ADO.NET.”

XML Manipulation

LINQ has a different set of classes and extensions to support the manipulation of XML data. We will create some examples using the following scenario. Imagine that your customers are able to send orders using XML files like the ORDERS.XML file shown in Listing 1-5.

Listing 1-5: A fragment of an XML file of orders

  <?xml version="1.0" encoding="utf-8" ?> <orders xmlns="http://schemas.devleap.com/Orders">     <order idCustomer="ALFKI" idProduct="1" quantity="10" price="20.59"/>     <order idCustomer="ANATR" idProduct="5" quantity="20" price="12.99"/>     <order idCustomer="KOENE" idProduct="7" quantity="15" price="35.50"/> </orders>

Using standard Microsoft .NET 2.0 System.Xml classes, you can load the file using a DOM approach or you can parse its contents using an XmlReader implementation. Regardless of the solution you choose, you must always consider nodes, node types, XML namespaces, and whatever else is related to the XML world. Many developers do not like working with XML because it requires the knowledge of another domain of data structures and uses syntax of its own.

If you need to extract all the products ordered with their quantities, you can parse the orders file using an XmlReader to accomplish this, as shown in Listing 1-6.

Listing 1-6: Reading the XML file of orders using an XmlReader

  String nsUri = "http://schemas.devleap.com/Orders"; XmlReader xmlOrders = XmlReader.Create( "Orders.xml" ); List<Order> orders = new List<Order>(); Order order = null; while (xmlOrders.Read()) {     switch (xmlOrders.NodeType) {         case XmlNodeType.Element:             if ((xmlOrders.Name == "order") &&             (xmlOrders.NamespaceURI == nsUri)) {                 order = new Order();                 order.CustomerID = xmlOrders.GetAttribute( "idCustomer" );                 order.Product = new Product();                 order.Product.IdProduct =                     Int32.Parse( xmlOrders.GetAttribute( "idProduct" ) );                 order.Product.Price =                     Decimal.Parse( xmlOrders.GetAttribute( "price" ) );                 order.Quantity =                     Int32.Parse( xmlOrders.GetAttribute( "quantity" ) );                 orders.Add( order );             }             break;     } }

You could also use an XQuery like the following one to select nodes:

 for $order in document("Orders.xml")/orders/order return $order

However, XQuery also requires learning another language and syntax. Moreover, the result of the previous XQuery sample should be converted into a set of Order instances to be used within our code. Finally, for many developers it is not very intuitive. As we have already said, LINQ provides a query engine suitable for any kind of source, even an XML document. By using LINQ queries, you can achieve the same result with less effort and with unified programming language syntax. Listing 1-7 shows a LINQ to XML query made over the orders file.

Listing 1-7: Reading the XML file using LINQ to XML

  XDocument xmlOrders = XDocument.Load( "Orders.xml" ); XNamespace ns = "http://schemas.devleap.com/Orders"; var orders = from o in xmlOrders.Root.Elements( ns + "order" )              select new Order {                         CustomerID = (String)o.Attribute( "idCustomer" ),                         Product = new Product {                             IdProduct = (Int32)o.Attribute("idProduct"),                             Price = (Decimal)o.Attribute("price") },                         Quantity = (Int32)o.Attribute("quantity")                     };

Using the new Microsoft Visual Basic 9.0 syntax, you can reference XML nodes in your code by using an XPath-like syntax, as shown in Listing 1-8.

Listing 1-8: Reading the XML file using LINQ to XML and Visual Basic 9.0 syntax

  Imports <xmlns:o="http://schemas.devleap.com/Orders"> ' ... Dim xmlOrders As XDocument = XDocument.Load("Orders.xml") Dim orders = _     From o In xmlOrders.<o:orders>.<o:order> _     Select New Order With {         .CustomerID = o.@idCustomer,_         .Product = New Product With {             .IdProduct = o.@idProduct,             .Price = o.@price}, _         .Quantity = o.@quantity}

The result of these LINQ to XML queries could be used to transparently load a list of Order entities into a customer Orders property, using LINQ to SQL to submit the changes into the physical database layer:

 customer.Orders.AddRange(     From o In xmlOrders.<o:orders>.<o:order> _     Where o.@idCustomer = customer.CustomerID _     Select New Order With {         .CustomerID = o.@idCustomer, _         .Product = New Product With {             .IdProduct = o.@idProduct,             .Price = o.@price}, _         .Quantity = o.@quantity})

If you need to generate an ORDERS.XML file starting from your customer’s orders, you can at least leverage Visual Basic 9.0 XML literals to define the output’s XML structure. This is shown in Listing 1-9.

Listing 1-9: Creating the XML of orders using Visual Basic 9.0 XML literals

  Dim xmlOrders = <o:orders>     <%= From o In orders _         Select <o:order idCustomer=<%= o.CustomerID %>                     idProduct=<%= o.Product.IdProduct %>                     quantity=<%= o.Quantity %>                     price=<%= o.Product.Price %>/> %>     </o:orders>

You can appreciate the power of this solution, which keeps the XML syntax without losing the stability of typed code and transforms a set of entities selected via LINQ to SQL into an XML InfoSet.

More Info

You will find more information about LINQ to XML syntax and its potential in Chapter 6, “LINQ to XML.”