Query Operators | Introducing MicrosoftВ® LINQ

The remaining sections of this chapter describe the main methods and generic delegates provided by the System.Linq namespace to query items with LINQ.

The Where Operator

Imagine that you need to list the names and cities of customers from Italy. To filter a set of items, you can use the Where operator, which is also called a restriction operator because it restricts a set of items. Listing 4-3 shows a simple example.

Listing 4-3: A query with a restriction

  var expr =     from   c in customers     where  c.Country == Countries.Italy     select new { c.Name, c.City };

Here are the signatures of the Where operator:

 public static IEnumerable<T> Where<T>(     this IEnumerable<T> source,     Func<T, bool> predicate); public static IEnumerable<T> Where<T>(     this IEnumerable<T> source,     Func<T, int, bool> predicate);

As you can see, two signatures are available. In Listing 4-3, we used the first signature, which enumerates items of the source sequence and yields those that verify the predicate (c.Country == Countries.Italy). The second signature accepts an additional parameter of type Integer for the predicate. This argument is used as a zero-based index of the elements within the source sequence. Keep in mind that if you pass null arguments to the predicates, an ArgumentNullException error will be thrown. You can use the index parameter to start filtering by a particular index, as shown in Listing 4-4.

Listing 4-4: A query with a restriction and an index-based filter

  var expr =     customers     .Where((c, index) => (c.Country == Countries.Italy && index >= 1))     .Select(c => c.Name);

Important

In Listing 4-4, we cannot use the LINQ query syntax because the Where version that we want to call is not supported by an equivalent LINQ query clause. We will use both syntaxes from here onward.

The result of Listing 4-4 will be the list of Italian customers, skipping the first one. The capability to filter items of the source sequence by using their positional index is useful when you want to extract a specific page of data from a large sequence of items. Listing 4-5 shows an example.

Listing 4-5: A query with a paging restriction

  int start = 5; int end = 10; var expr =     customers     .Where((c, index) => ((index >= start) && (index < end)))     .Select(c => c.Name);

Keep in mind that it is generally not a good practice to store large sequences of data loaded from a database persistence layer in memory; usually, it is better to page data at the persistence layer level. Therefore, use this paging technique only if you have already loaded data into memory. Reloading the current page from a persistence layer is less efficient than directly accessing the sequence already loaded “in memory.”

Projection Operators

The following sections describe how to use projection operators. These operators are used to select (or “project”) contents from the source enumeration into the result.

Select

In Listing 4-3, you saw an example of defining the result of the query by using the Select operator. The signatures for the Select operator are shown here:

 public static IEnumerable<S> Select<T, S>(     this IEnumerable<T> source,     Func<T, S> selector); public static IEnumerable<S> Select<T, S>(     this IEnumerable<T> source,     Func<T, int, S> selector);

The Select operator is one of the projection operators because it projects the query results, making them available through an object that implements IEnumerable<T>. This object will enumerate items identified by the selector predicate. Like the Where operator, Select enumerates the source sequence and yields the result of the selector predicate. Consider the following predicate:

 var expr = customers.Select(c => c.Name);

This predicate’s result will be a sequence of customer names (IEnumerable<string>). Now consider this example:

 var expr = customers.Select(c => new { c.Name, c.City });

This predicate projects a sequence of an anonymous type, defined as a tuple of Name and City, for each customer object. With the second overload of Select, we can also provide an argument of type Integer for the predicate. This zero-based index is used to define the positional index of each item inserted in the resulting sequence.

SelectMany

Imagine that you want to select all the orders of customers from Italy. You could write the query shown in Listing 4-6 using the verbose method.

Listing 4-6: The list of orders made by Italian customers

  var orders =     customers     .Where(c => c.Country == Countries.Italy)     .Select(c => c.Orders); foreach(var item in orders) { Console.WriteLine(item); }

Because of the behavior of the Select operator, the resulting type of this query will be IEnumerable<Order[]>, where each item in the resulting sequence represents the array of orders of a single customer. In fact, the Orders property of a Customer instance is of type Order[]. The output of the code in Listing 4-6 would be the following:

 DevLeap.Linq.Ch4.Operators.Order[] DevLeap.Linq.Ch4.Operators.Order[]

To have a “flat” IEnumerable<Order> result type, we need to use the SelectMany operator:

 public static IEnumerable<S> SelectMany<T, S>(     this IEnumerable<T> source,     Func<T, IEnumerable<S>> selector); public static IEnumerable<S> SelectMany<T, S>(     this IEnumerable<T> source,     Func<T, int, IEnumerable<S>> selector); public static IEnumerable<S> SelectMany<T, C, S>(     this IEnumerable<T> source,     Func<T, IEnumerable<C>> collectionSelector,     Func<T, C, S> resultSelector);

This operator enumerates the source sequence and merges the resulting items, providing them as a single enumerable sequence. The second overload available is analogous to the equivalent overload for Select, which allows a zero-based integer index for indexing purposes. Listing 4-7 shows an example.

Listing 4-7: The flattened list of orders made by Italian customers

  IEnumerable<Order> orders =     customers     .Where(c => c.Country == Countries.Italy)     .SelectMany(c => c.Orders);

Using the query expression syntax, the query in Listing 4-7 can be written with the code shown in Listing 4-8.

Listing 4-8: The flattened list of orders made by Italian customers, written with a query expression

  IEnumerable<Order> orders =     from   c in customers     where  c.Country == Countries.Italy         from   o in c.Orders         select o;

The select keyword in query expressions, for all but the initial from clause, is translated to invocations of SelectMany. In other words, every time you see a query expression with more than one from clause, you can apply this rule: the select over the first from clause is converted to an invocation of Select, and the other select commands are translated into a SelectMany call.

The third overload of SelectMany is useful whenever you need to select a custom result from the source set of sequences instead of simply merging their items, as with the two previous overloads. This overload invokes the collectionSelector predicate over the source sequence and returns the result of the resultSelector predicate, applied to each item in the collections selected by collectionSelector. In Listing 4-9, you can see an example of this method, used to extract a new anonymous type made from the Quantity and IdProduct of each order of Italian customers.

Listing 4-9: The list of Quantity and IdProduct of orders made by Italian customers

  var items = customers   .Where(c => c.Country == Countries.Italy)   .SelectMany(c => c.Orders,     (c, o) => new {o.Quantity, o.IdProduct});

The query in Listing 4-9 can be written with the query expression shown in Listing 4-10.

Listing 4-10: The list of Quantity and IdProduct of orders made by Italian customers, written with a query expression

  IEnumerable<Order> orders =     from   c in customers     where  c.Country == Countries.Italy         from   o in c.Orders         select new {o.Quantity, o.IdProduct};

Ordering Operators

Another useful set of operators is the ordering operators group. Ordering operators are used to determine the ordering and direction of elements in output sequences.

OrderBy and OrderByDescending

Sometimes it is helpful to apply an order to the results of a database query. LINQ can order the results of queries, in ascending or descending order, by using ordering operators, just as we do in SQL syntax. For instance, if you need to select the Name and City of all Italian customers in descending order by Name, you can write the corresponding query expression shown in Listing 4-11.

Listing 4-11: A query expression with a descending orderby clause

  var expr =     from    c in customers     where   c.Country == Countries.Italy     orderby c.Name descending     select  new { c.Name, c.City };

The query expression syntax will translate the orderby keyword into one of the following ordering extension methods:

 public static IOrderedSequence<T> OrderBy<T, K>(     this IEnumerable<T> source,     Func<T, K> keySelector); public static IOrderedSequence<T> OrderBy<T, K>(     this IEnumerable<T> source,     Func<T, K> keySelector,     IComparer<K> comparer); public static IOrderedSequence<T> OrderByDescending<T, K>(     this IEnumerable<T> source,     Func<T, K> keySelector); public static IOrderedSequence<T> OrderByDescending<T, K>(     this IEnumerable<T> source,     Func<T, K> keySelector,     IComparer<K> comparer);

As you can see, the two main extension methods, OrderBy and OrderByDescending, both have two overloads. The methods’ names suggest their objective: OrderBy is for ascending order, and OrderByDescending is for descending order. The keySelector argument represents a function that extracts a key, of type K, from each item of type T, taken from the source sequence. The extracted key represents the typed content to be compared by the comparer while ordering, and the T type describes the type of each item of the source sequence. Both methods have an overload that allows you to provide a custom comparer. If no comparer is provided or the comparer argument is null, a default comparer is used (Comparer<K>.Default). It is important to emphasize that these ordering methods return not just IEnumerable<T> but IOrderedSequence<T>, which is an interface that implements IEnumerable<T> internally.

The code sample in Listing 4-11 will be translated to the following:

 var expr =     customers     .Where(c => c.Country == Countries.Italy)     .OrderByDescending(c => c.Name)     .Select(c => new { c.Name, c.City } );

ThenBy and ThenByDescending

When you need to order data by many different keys, you can take advantage of the ThenBy and ThenByDescending operators. Here are their signatures:

 public static IOrderedSequence<T> ThenBy<T, K>(     this IOrderedSequence<T> source,     Func<T, K> keySelector); public static IOrderedSequence<T> ThenBy<T, K>(     this IOrderedSequence<T> source,     Func<T, K> keySelector,     IComparer<K> comparer); public static IOrderedSequence<T> ThenByDescending<T, K>(     this IOrderedSequence<T> source,     Func<T, K> keySelector); public static IOrderedSequence<T> ThenByDescending<T, K>(     this IOrderedSequence<T> source,     Func<T, K> keySelector,     IComparer<K> comparer);

These operators have signatures similar to OrderBy and OrderByDescending. The difference is that ThenBy and ThenByDescending can be applied only to IOrderedSequence<T> and not to any IEnumerable<T>. Therefore, you can use the ThenBy or ThenByDescending operator just after the first use of OrderBy or OrderByDescending. Here is an example:

 var expr = customers     .Where(c => c.Country == Countries.Italy)     .OrderByDescending(c => c.Name)     .ThenBy(c => c.City)     .Select(c => new { c.Name, c.City } );

In Listing 4-12, you can see the corresponding query expression.

Listing 4-12: A query expression with orderby and thenby

  var expr =     from    c in customers     where   c.Country == Countries.Italy     orderby c.Name descending, c.City     select  new { c.Name, c.City };

Important

In the case of multiple occurrences of the same key within a sequence to be ordered, the result is not guaranteed to be “stable.” In such conditions, the original ordering cannot be preserved by the comparer.

A custom comparer might be useful when the items in your source sequence need to be ordered using custom logic. For instance, imagine that you want to select all the orders of your customers ordered by month:

 var expr =     from c in customers         from    o in c.Orders         orderby o.Month         select  o;

If you apply the default comparer to the Month property of the orders, you will get a result alphabetically ordered. The result is wrong because the Month property is just a string and not a number or a date:

 20 - True - December – 3 20 - True - December – 3 3 - False - January – 1 20 - False - July – 5 10 - False - July – 1 5 - True - May - 2

You should use a custom MonthComparer that correctly compares months:

 using System.Globalization; private class MonthComparer: IComparer<string> {     public int Compare(string x, string y) {         DateTime xDate = DateTime.ParseExact(x, "MMMM", new CultureInfo("en-US"));         DateTime yDate = DateTime.ParseExact(y, "MMMM", new CultureInfo("en-US"));         return(Comparer<DateTime>.Default.Compare(xDate, yDate)); } }

The newly defined custom MonthComparer could be passed as a parameter while invoking the OrderBy extension method, as in Listing 4-13.

Listing 4-13: A custom comparer used with an OrderBy operator

  IEnumerable<Order> orders =     customers     .SelectMany(c => c.Orders)     .OrderBy(o => o.Month, new MonthComparer());

Reverse Operator

Sometimes you need to reverse the result of a query, listing the last item in the result first. LINQ provides a last-ordering operator, called Reverse, that allows you to perform this operation:

 public static IEnumerable<T> Reverse<T>(     this IEnumerable<T> source);

The implementation of Reverse is quite simple. It just yields each item in the source sequence in reverse order. Listing 4-14 shows an example of its use.

Listing 4-14: The Reverse operator applied

  var expr =     customers     .Where(c => c.Country == Countries.Italy)     .OrderByDescending(c => c.Name)     .ThenBy(c => c.City)     .Select(c => new { c.Name, c.City } )     .Reverse();

The Reverse operator, like many other operators, does not have a short “alias” in LINQ query expressions. However, we can merge query expression syntax with operators, as shown in Listing 4-15.

Listing 4-15: The Reverse operator applied to a query expression with orderby and thenby

  var expr =     (from    c in customers     where   c.Country == Countries.Italy     orderby c.Name descending, c.City     select  new { c.Name, c.City }     ).Reverse();

As you can see, we apply the Reverse operator to the expression resulting from Listing 4-11. Under the covers, the inner query expression is first translated to the resulting list of extension methods, and then the Reverse method is applied. It is just like Listing 4-14, but easier to write.

Grouping Operators

Now you have seen how to select, filter, and order sequences of items. Sometimes when querying contents, you also need to group results based on specific criteria. To realize content groupings, you use a grouping operator.

The GroupBy operator, also called a grouping operator, is the only operator of this family and provides the following overloads:

 public static IEnumerable<IGrouping<K, T>> GroupBy<T, K>(     this IEnumerable<T> source, Func<T, K> keySelector); public static IEnumerable<IGrouping<K, T>> GroupBy<T, K>(     this IEnumerable<T> source, Func<T, K> keySelector,     IEqualityComparer<K> comparer); public static IEnumerable<IGrouping<K, E>> GroupBy<T, K, E>(     this IEnumerable<T> source, Func<T, K> keySelector,     Func<T, E> elementSelector); public static IEnumerable<IGrouping<K, E>> GroupBy<T, K, E>(     this IEnumerable<T> source, Func<T, K> keySelector,     Func<T, E> elementSelector, IEqualityComparer<K> comparer);

All of these overloads return IEnumerable<IGrouping<K, T>>, where the IGrouping<K, T> generic interface is a specialized implementation of IEnumerable<T>. This implementation can return a specific Key of type K for each item within the enumeration:

 public interface IGrouping<K, T> : IEnumerable<T> {     K Key { get; } }

From a practical point of view, a type that implements this generic interface is simply a typed enumeration with an identifying type Key for each item. All the GroupBy methods work on a source sequence as usual, and they call the keySelector function to extract the Key value from each item to group results based on the different Key values. The elementSelector argument, if present, defines a function that maps the source element within the source sequence to the destination element of the resulting sequence. If you do not specify the elementSelector, elements are mapped directly from the source to the destination. (You will see an example of this later in the chapter, in Listing 4-18.)

The GroupBy method selects pairs of keys and items for each item in source, using the keySelector predicate and, if present, the elementSelector argument. Then it yields a sequence of IGrouping<K, T> objects, where each group consists of a sequence of items with a common Key value. The last optional argument you can pass to the method is a custom comparer, which is useful when you need to compare key values and define group membership. If no custom comparer is provided, the EqualityComparer<K>.Default is used. The order of keys and items within each group corresponds to their occurrence within the source. Listing 4-16 shows an example of using the GroupBy operator.

Listing 4-16: The GroupBy operator used to group customers by Country

  var expr = customers.GroupBy(c => c.Country); foreach(IGrouping<Countries, Customer> customerGroup in expr) {     Console.WriteLine("Country: {0}", customerGroup.Key);     foreach(var item in customerGroup) {         Console.WriteLine(item);     } }

As Listing 4-16 shows, you need to enumerate all group keys before iterating over the items contained within each group. Every group is an instance of a type that implements IGrouping<Countries, Customer>, because we are using the default elementSelector that directly projects the source Customer instances into the result. In query expressions, the GroupBy operator can be defined using the group … by … syntax, which is shown in Listing 4-17.

Listing 4-17: A query expression with a group by syntax

  var expr =     from  c in customers     group c by c.Country; foreach(IGrouping<Countries, Customer> customerGroup in expr) {     Console.WriteLine("Country: {0}", customerGroup.Key);     foreach(var item in customerGroup) {         Console.WriteLine(item);     } }

The code defined in Listing 4-17 is semantically equivalent to the code shown in Listing 4-16.

Listing 4-18 is another example of grouping, this time with a custom elementSelector.

Listing 4-18: The GroupBy operator used to group customer names by Country

  var expr =     customers     .GroupBy(c => c.Country, c => c.Name); foreach(IGrouping<Countries, string> customerGroup in expr) {     Console.WriteLine("Country: {0}", customerGroup.Key);     foreach(var item in customerGroup) {         Console.WriteLine("  {0}", item);     } }

Here is the result of this code:

 Country: Italy   Paolo   Marco Country: USA   James   Frank

In this last example, the result is a class that implements IGrouping<Countries, string>, because the elementSelector predicate projects only the customers’ names (of type string) into the output sequence.

Join Operators

Join operators are used to define relationships within sequences in LINQ queries. From a SQL and relational point of view, almost every query requires joining one or more tables. In LINQ, a set of join operators is defined to implement this behavior.

Join

The first operator of this group is of course the Join method, which is defined by the following signatures:

 public static IEnumerable<V> Join<T, U, K, V>(     this IEnumerable<T> outer,     IEnumerable<U> inner,     Func<T, K> outerKeySelector,     Func<U, K> innerKeySelector,     Func<T, U, V> resultSelector); public static IEnumerable<V> Join<T, U, K, V>(     this IEnumerable<T> outer,     IEnumerable<U> inner,     Func<T, K> outerKeySelector,     Func<U, K> innerKeySelector,     Func<T, U, V> resultSelector,     IEqualityComparer<K> comparer);

Join requires a set of four generic types. The T type represents the type of the outer source sequence, and the U type describes the type of the inner source sequence. The predicates outerKeySelector and innerKeySelector define how to extract the identifying keys from the outer and inner source sequence items, respectively. These keys are both of type K, and their equivalence defines the join condition. The resultSelector predicate defines what to project into the result sequence, which will be an implementation of IEnumerable<V>. V is the last generic type needed by the operator, and it defines the type of each single item in the join result sequence. The second overload of the method has an additional custom equality comparer, used to compare the keys. If the comparer argument is NULL or if the first overload of the method is invoked, a default key comparer (EqualityComparer<TKey>.Default) will be used.

Here is an example that will make the use of Join more clear. Think about our customers, with their orders and products. In Listing 4-19, a query joins orders with their corresponding products.

Listing 4-19: The Join operator used to map orders with products

  var expr =     customers     .SelectMany(c => c.Orders)     .Join( products,            o => o.IdProduct,            p => p.IdProduct,            (o, p) => new {o.Month, o.Shipped, p.IdProduct, p.Price });

The following is the result of the query:

 {Month=January, Shipped=False, IdProduct=1, Price=10} {Month=May, Shipped=True, IdProduct=2, Price=20} {Month=July, Shipped=False, IdProduct=1, Price=10} {Month=December, Shipped=True, IdProduct=3, Price=30} {Month=January, Shipped=True, IdProduct=3, Price=30} {Month=July, Shipped=False, IdProduct=4, Price=40}

In this example, orders represents the outer sequence and products is the inner sequence. The o and p used in lambda expressions are of type Order and Product, respectively. Internally, the operator collects the elements of the inner sequence into a hash table, using their keys extracted with innerKeySelector. It then enumerates the outer sequence and maps its elements, based on the Key value extracted with outerKeySelector, to the hash table of items. Because of its implementation, the Join operator result sequence keeps the order of the outer sequence first, and then uses the order of the inner sequence for each outer sequence element.

From an SQL point of view, the example in Listing 4-19 can be thought of as an inner equijoin somewhat like the following SQL query:

 SELECT     o.Month, o.Shipped, p.IdProduct, p.Price FROM       Orders AS o INNER JOIN Products AS p       ON   o.IdProduct = p.IdProduct

If you want to translate the SQL syntax into the Join operator syntax, you can think about the columns selection in SQL as the resultSelector predicate, while the equality condition on IdProduct columns (of orders and products) corresponds to the pair of innerKeySelector and outerKeySelector predicates.

The Join operator has a corresponding LINQ syntax, which is shown in Listing 4-20.

Listing 4-20: The Join operator query expression syntax

  var expr =     from c in customers         from   o in c.Orders         join   p in products                on o.IdProduct equals p.IdProduct         select new {o.Month, o.Shipped, p.IdProduct, p.Price };

Important

The order of items to relate (o.IdProduct equals p.IdProduct) in LINQ query syntax must have the outer sequence first and the inner sequence after; otherwise, the LINQ query will not compile. This requirement is different from standard SQL queries, in which item ordering does not matter.

GroupJoin

In cases in which you need to define something similar to a LEFT OUTER JOIN or a RIGHT OUTER JOIN, you need to use the GroupJoin operator. Its signatures are quite similar to the Join operator:

 public static IEnumerable<V> GroupJoin<T, U, K, V>(     this IEnumerable<T> outer,     IEnumerable<U> inner,     Func<T, K> outerKeySelector,     Func<U, K> innerKeySelector,     Func<T, IEnumerable<U>, V> resultSelector); public static IEnumerable<V> GroupJoin<T, U, K, V>(     this IEnumerable<T> outer,     IEnumerable<U> inner,     Func<T, TKey> outerKeySelector,     Func<U, TKey> innerKeySelector,     Func<T, IEnumerable<U>, V> resultSelector,     IEqualityComparer<TKey> comparer);

The only difference is the definition of the resultSelector predicate. It requires an instance of IEnumerable<U>, instead of a single object of type U, because it projects a hierarchical result of type IEnumerable<V>, made of a selection of each item extracted from the inner sequence joined with a group of items, of type U, extracted from the outer sequence.

As a result of this behavior, the output is not a flattened outer equijoin, which would be produced by using the Join operator, but a hierarchical sequence of items. Nevertheless, you can define queries using GroupJoin with results equivalent to the Join operator, whenever the mapping is a one-to-one relationship. In case of the absence of a corresponding element group in the inner sequence, the GroupJoin operator extracts the outer sequence element paired with an empty sequence (Count = 0). In Listing 4-21, you can see an example of this operator.

Listing 4-21: The GroupJoin operator used to map products with orders, if present

  var expr =     products     .GroupJoin(         customers.SelectMany(c => c.Orders),         p => p.IdProduct,         o => o.IdProduct,         (p, orders) => new { p.IdProduct, Orders = orders }); foreach(var item in expr) {     Console.WriteLine("Product: {0}", item.IdProduct);     foreach (var order in item.Orders) {         Console.WriteLine("    {0}", order); }}

The following is the result of Listing 4-21:

 Product: 1     3 - False - January – 1     10 - False - July – 1 Product: 2     5 - True - May – 2 Product: 3     20 - True - December – 3     10 - True - January – 3 Product: 4 Product: 5     20 - False - July – 5 Product: 6

You can see that products 4 and 6 have no mapping orders, but the query returns them nonetheless. You can think about this operator like a SELECT … FOR XML AUTO query in Transact-SQL in Microsoft SQL Server 2000 and 2005. In fact, it returns results hierarchically grouped like a set of XML nodes nested within their parent nodes, similar to the default result of a FOR XML AUTO query.

In a query expression, the GroupJoin operator is defined as a join … into … clause. The query expression shown in Listing 4-22 is equivalent to Listing 4-21.

Listing 4-22: A query expression with a join into clause

  var customersOrders =     from c in customers         from o in c.Orders         select o; var expr =     from   p in products     join   o in customersOrders                 on p.IdProduct equals o.IdProduct                 into orders     select new { p.IdProduct, Orders = orders };

In this example, we first define an expression called customersOrders to extract the flat list of orders. (This expression still uses the SelectMany operator.) We could also define a single query expression, nesting the customersOrders expression within the main query. This approach is shown in Listing 4-23.

Listing 4-23: The query expression of Listing 4-22 in its compact version

  var expr =     from   p in products     join   o in (            from c in customers                from   o in c.Orders                select o            ) on p.IdProduct equals o.IdProduct            into orders     select new { p.IdProduct, Orders = orders };

Set Operators

Our journey through LINQ operators continues with a group of methods that are used to handle sets of data, applying common set operations (union, intersect, and except) and selecting unique occurrences of items (distinct).

Distinct

Imagine that you want to extract all products that are mapped to orders, avoiding duplicates. This requirement could be solved in standard SQL using a DISTINCT clause within a JOIN query. LINQ provides a Distinct operator, a member of the set operators. Its signature is quite simple. It requires just a source sequence, from which all the distinct occurrences of items will be yielded. An example of the operator is shown in Listing 4-24.

 public static IEnumerable<T> Distinct<T>(     this IEnumerable<T> source);

Listing 4-24: The Distinct operator applied to the list of products used in orders

  var expr =     customers     .SelectMany(c => c.Orders)     .Join(products,           o => o.IdProduct,           p => p.IdProduct,           (o, p) => p)     .Distinct();

Distinct does not have an equivalent query expression clause; hence, as we did in Listing 4-15, we can apply this operator to the result of a query expression, as shown in Listing 4-25.

Listing 4-25: The Distinct operator applied to a query expression

  var expr =     (from c in customers          from   o in c.Orders          join   p in products                 on o.IdProduct equals p.IdProduct          select p     ).Distinct();

By default, Distinct compares and identifies elements using their GetHashCode and Equals methods because, internally, it uses a default comparer of type EqualityComparer<T>.Default. We can, if necessary, override our type behavior to change the Distinct result, or we can just use the second overload of the Distinct method.

 public static IEnumerable<T> Distinct<T>(     this IEnumerable<T> source,     IEqualityComparer<T> comparer);

This last overload accepts a comparer argument, available to provide a custom comparer for instances of type T.

Note

We will see an example of how to compare reference types in the Union operator examples in Listing 4-26.

Union, Intersect, and Except

Within the set operators group, three more operators are useful for classic set operations. They are Union, Intersect, and Except, and they share a similar definition:

 public static IEnumerable<T> Union<T>(     this IEnumerable<T> first,     IEnumerable<T> second); public static IEnumerable<T> Union<T>(     this IEnumerable<T> first,     IEnumerable<T> second,     IEqualityComparer<T> comparer); public static IEnumerable<T> Intersect<T>(     this IEnumerable<T> first,     IEnumerable<T> second); public static IEnumerable<T> Intersect<T>(     this IEnumerable<T> first,     IEnumerable<T> second,     IEqualityComparer<T> comparer); public static IEnumerable<T> Except<T>(     this IEnumerable<T> first,     IEnumerable<T> second); public static IEnumerable<T> Except<T>(     this IEnumerable<T> first,     IEnumerable<T> second,     IEqualityComparer<T> comparer);

The Union operator yields the first sequence elements and the second sequence elements, skipping duplicates. For instance, in Listing 4-26, you can see how to merge the orders of the second customer with the orders of the third.

Listing 4-26: The Union operator applied to the second and third customer orders

  var expr = customers[1].Orders.Union(customers[2].Orders);

As with the Distinct operator, in Union, Intersect, and Except, the elements are compared by using the GetHashCode and Equals methods in the first overload, or by using a custom comparer in the second overload. Here is the result of Listing 4-26:

 10 - False - July – 1 20 - True - December – 3 20 - True - December - 3

The result might seem unexpected because we have two rows that appear to be the same. However, if you look at the initialization code used in all of our examples, each order is a different instance of the Order reference type. Even if the second order of the second customer is semantically equal to the first order of the third customer, they have two different hash codes. You can see this effect in the following code, where the two semantically equivalent Order instances are in bold:

 customers[1] = new Customer {Name = "Marco", City = "Torino",     Country = Countries.Italy, Orders = new Order[] {     new Order {Quantity = 10, IdProduct = 1 ,       Shipped = false, Month = "July"},     new Order {Quantity = 20, IdProduct = 3 ,       Shipped = true, Month = "December"}}}; customers[2] = new Customer {Name = "James", City = "Dallas",     Country = Countries.USA, Orders = new Order[] {     new Order {Quantity = 20, IdProduct = 3 ,       Shipped = true, Month = "December"}}};

We have not defined a value type semantic for our Order reference type. To get the expected result, we can implement a value type semantic by overriding the GetHashCode and Equals implementations of the type to be compared. In this situation, it might be useful to do that, as you can see in this new Order implementation:

 public class Order {     public int Quantity;     public bool Shipped;     public string Month;     public int IdProduct;     public override string ToString() {         return String.Format("{0} - {1} - {2} - {3}",         this.Quantity, this.Shipped, this.Month, this.IdProduct);     }     public override bool Equals(object obj) {         if (!(obj is Order))             return false;         else {             Order o = (Order)obj;             return(o.IdProduct == this.IdProduct &&                    o.Month == this.Month &&                    o.Quantity == this.Quantity &&                    o.Shipped == this.Shipped); }     }     public override int GetHashCode() {         return String.Format("{0}|{1}|{2}|{3}", this.IdProduct,           this.Month, this.Quantity, this.Shipped).GetHashCode();     } }

Another way to get the correct result is to use the second overload of the Union method, providing a custom comparer for the Order type. A final way to get the expected distinct behavior is to define the Order type as a value type, using struct instead of class in its declaration. By the way, it is not always possible to define a struct, because sometimes you need to implement an object-oriented infrastructure using type inheritance.

 // Using struct instead of class, we get a value type public struct Order {     public int Quantity;     public bool Shipped;     public string Month;     public int IdProduct; }

Remember that an anonymous type is defined as a reference type with a value type semantic. In other words, all anonymous types are defined as a class with an override of GetHashCode and Equals written by the compiler.

In Listing 4-27, you can find an example of using Intersect and Except.

Listing 4-27: The Intersect and Except operators applied to the second and third customer orders

  var expr1 = customers[1].Orders.Intersect(customers[2].Orders); var expr2 = customers[1].Orders.Except(customers[2].Orders);

The Intersect operator yields only the elements that occur in both sequences, and the Except operator yields all the elements in the first sequence that are not present in the second sequence. Once again, there are no compact clauses to define set operators in query expressions, but we can apply them to LINQ query results, as in Listing 4-28.

Listing 4-28: Set operators applied to query expressions

  var expr =     (from c in customers          from   o in c.Orders          where  c.Country == Countries.Italy          select o     ).Intersect(         from c in customers             from   o in c.Orders             where  c.Country == Countries.USA             select o);

Value Type vs. Reference Type Semantic

Remember that all the considerations for Union and Distinct operators are also valid for Intersect and Except. In general, they are valid for each operation that involves a comparison of two items made by LINQ to Objects. The result of the Intersect operation illustrated in Listing 4-28 is an empty set whenever the Order type is a reference type with no overload operators defined, like the one defined at the beginning of this chapter, in case we do not provide a custom comparer. If you define Order as a value type (using struct instead of class), you get an order (20 - True - December - 3) as an Intersection result. Once again, we want to emphasize that when using LINQ, it is better to use types with a value type semantic, even if they are reference types, so that you get consistent behavior across all regular and anonymous types.

Aggregate Operators

At times, you need to make some aggregations over sequences to make calculations on source items. To accomplish this, LINQ provides the family of aggregate operators that implement the most common aggregate functions: Count, LongCount, Sum, Min, Max, Average, and Aggregate. Many of these operators are simple to use because their behavior is easy to understand.

Count and LongCount

Imagine that you want to list all customers, each one followed by the number of orders the customer has placed. In Listing 4-29, you can see an equivalent syntax, based on the Count operator.

Listing 4-29: The Count operator applied to customer orders

  var expr =     from   c in customers     select new {c.Name, c.City, c.Country, OrdersCount = c.Orders.Count() };

The Count operator provides a couple of signatures, as does the LongCount operator:

 public static int Count<T>(     this IEnumerable<T> source); public static int Count<T>(     this IEnumerable<T> source,     Func<T, bool> predicate); public static long LongCount<T>(     this IEnumerable<T> source); public static long LongCount<T>(     this IEnumerable<T> source,     Func<T, bool> predicate);

The signature shown in Listing 4-29 is the common and simpler one that simply counts items in the source sequence. The second method overload accepts a non-nullable predicate, which is used to filter the items to count. LongCount variations simply return a long instead of an integer.

Sum

The Sum operator requires more attention because it has multiple definitions:

 public static Numeric Sum(     this IEnumerable<Numeric> source); public static Numeric Sum<T>(     this IEnumerable<T> source,     Func<T, Numeric> selector);

We used Numeric in the syntax to generalize the return type of the Sum operator. In practice, it has many definitions, one for each of the main Numeric types: int, int?, long, long?, float, float?, double, double?, decimal, and decimal?.

Important

As you probably know, in C# 2.0 and later, the question mark that appears after a value type name (T?) defines a nullable type (Nullable<T>) of this type. For instance, int? means Nullable<System.Int32>.

The first implementation sums the source sequence items, assuming that the items are all the same numeric type, and returns the result. In the case of an empty source sequence, zero is returned. In the case of nullable types, the result might be null. This implementation can be used when the items can be summed directly. For example, we can sum an array of integers as in this code:

 int[] values = { 1, 3, 9, 29 }; int   total  = values.Sum();

When the sequence is not made up of simple Numeric types, we need to extract values to be summed from each item in the source sequence. To do that, we can use the second overload, which accepts a selector argument. You can see an example of this syntax in Listing 4-30.

Listing 4-30: The Sum operator applied to customer orders

  var customersOrders =     from c in customers         from   o in c.Orders         join   p in products                on o.IdProduct equals p.IdProduct         select new { c.Name, OrderAmount = o.Quantity * p.Price }; var expr =     from   c in customers     join   o in customersOrders            on c.Name equals o.Name            into customersWithOrders     select new { c.Name,                  TotalAmount = customersWithOrders.Sum(o => o.OrderAmount) };

In Listing 4-30, we join customers with the customersOrders sequence, returning for each customer the total number of orders, calculated with the Sum operator. As usual, we can collapse the previous code using nested queries, which is the approach shown in Listing 4-31.

Listing 4-31: The Sum operator applied to customer orders, with a nested query

  var expr =     from   c in customers     join   o in (            from c in customers                from   o in c.Orders                join   p in products                       on o.IdProduct equals p.IdProduct                select new { c.Name, OrderAmount = o.Quantity * p.Price }            ) on c.Name equals o.Name            into customersWithOrders     select new { c.Name,                  TotalAmount = customersWithOrders.Sum(o => o.OrderAmount) };

SQL vs. LINQ Query Syntax

At this point, we want to make a comparison with SQL syntax, because there are similarities but also important differences. The following is an SQL statement similar to the query expression in Listing 4-31, assuming that customer names are unique:

 SELECT   c.Name, SUM(o.OrderAmount) AS OrderAmount FROM     customers AS c INNER JOIN (     SELECT     c.Name, o.Quantity * p.Price AS OrderAmount     FROM       customers AS c     INNER JOIN orders AS o ON c.Name = o.Name     INNER JOIN products AS p ON o.IdProduct = p.IdProduct     ) AS o ON       c.Name = o.Name GROUP BY c.Name

You can see that this SQL syntax is redundant. In fact, we can obtain the same result with this simpler SQL query:

 SELECT   c.Name, SUM(o.OrderAmount) AS OrderAmount FROM     customers AS c INNER JOIN (     SELECT     o.Name, o.Quantity * p.Price AS OrderAmount     FROM       orders AS o     INNER JOIN products AS p ON o.IdProduct = p.IdProduct     ) AS o ON       c.Name = o.Name GROUP BY c.Name

But it can be simpler and shorter still, as in the following SQL query:

 SELECT     c.Name, SUM(o.Quantity * p.Price) AS OrderAmount FROM       customers AS c INNER JOIN orders AS o ON c.Name = o.Name INNER JOIN products AS p ON o.IdProduct = p.IdProduct GROUP BY   c.Name

If we started from this last SQL query and tried to write a corresponding query expression syntax using LINQ, we would probably encounter some difficulties. The reason is that SQL queries data through relationships, but all data is flat (in tables) until it is queried. On the other side, LINQ handles data that can have native hierarchical relationships, just as our Customer/Orders/Products data. This difference implies that sometimes one approach has advantages over the other and vice versa, depending on the kind of query and the kind of data you are working on.

For these reasons, the best expression of a query can appear differently in SQL and in LINQ query expression syntax, even if the query is getting the same results from the same data.

Min and Max

Within the set of aggregate operators, Min and Max calculate the minimum and maximum values of the source sequence, respectively. Both of these extension methods provide a rich set of overloads:

 public static Numeric Min/Max(     this IEnumerable<Numeric> source); public static T Min<T>/Max<T>(     this IEnumerable<T> source); public static Numeric Min<T>/Max<T>(     this IEnumerable<T> source,     Func<T, Numeric> selector); public static S Min<T, S>/Max<T, S>(     this IEnumerable<T> source,     Func<T, S> selector);

The first signature, as in the Sum operator, provides many definitions for the main numeric types (int, int?, long, long?, float, float?, double, double?, decimal, and decimal?), and it computes the minimum or maximum value on an arithmetic basis, using the elements of the source sequence. This signature is useful when the source elements are numbers by themselves, as in Listing 4-32.

Listing 4-32: The Min operator applied to order quantities

  var expr =     (from c in customers          from   o in c.Orders          select o.Quantity     ).Min();

The second signature computes the minimum or maximum value of the source elements regardless of their type. The comparison is made using the IComparable<T> interface implementation, if supported by the source elements, or the nongeneric IComparable interface implementation. If the source type T does not implement either of these interfaces, an ArgumentException error will be thrown, with an Exception.Message equal to “At least one object must implement IComparable.” To examine this situation, take a look at Listing 4-33, in which the resulting anonymous type does not implement either of the interfaces required by the Min operator.

Listing 4-33: The Min operator applied to wrong types (thereby throwing an ArgumentException)

  var expr =     (from c in customers          from o in c.Orders          select new { o.Quantity}     ).Min();

In the case of an empty source or null source values, the result will be null whenever the Numeric type is a nullable type; otherwise, ArgumentNullException will be thrown. The selector predicate, available in the last two signatures, defines the function with which to extract values from the source sequence elements. For instance, you can use these overloads to avoid errors related to missing interface implementations (IComparable<T>/IComparable), as in Listing 4-34.

Listing 4-34: The Max operator applied to custom types, with a value selector

  var expr =     (from c in customers          from o in c.Orders          select new { o.Quantity}     ).Min(o => o.Quantity);

Average

The Average operator calculates the arithmetic average of a set of values, extracted from a source sequence. Like the previous operators, this function works with the source elements themselves or with values extracted using a selector predicate:

 public static Result Average(     this IEnumerable<Numeric> source); public static Result Average<T>(     this IEnumerable<T> source,     Func<T, Numeric> selector);

The Numeric type can be int, int?, long, long?, float, float?, double, double?, decimal, or decimal?. The Result type always reflects the “nullability” of the numeric type. When the Numeric type is int or long, the Result type is double. When the Numeric type is int? or long?, the Result type is double?. Otherwise, the Numeric and Result types are the same.

When the sum of the values used to compute the arithmetic average is too large for the result type, an OverflowException error is thrown. Because of its definition, the Average operator’s first signature can be invoked only on a Numeric sequence. If you want to invoke it on a source sequence, you need to provide a selector predicate. In Listing 4-35, you can see an example of both of the overloads.

Listing 4-35: Both Average operator signatures applied to product prices

  var expr =     (from p in products     select p.Price     ).Average(); var expr =     (from p in products     select new { p.Price }     ).Average(p => p.Price);

The second signature is useful when you are defining a query in which the average is just one of the results to extract. An example is shown in Listing 4-36, where we extract all customers and their average order amounts.

Listing 4-36: Customers and their average order amounts

  var expr =     from   c in customers     join   o in (            from c in customers                from   o in c.Orders                join   p in products                       on o.IdProduct equals p.IdProduct                select new { c.Name, OrderAmount = o.Quantity * p.Price }            ) on c.Name equals o.Name            into customersWithOrders     select new { c.Name,                  AverageAmount = customersWithOrders.Average(o => o.OrderAmount) };

The results will be similar to the following:

 {Name=Paolo, AverageAmount=65} {Name=Marco, AverageAmount=350} {Name=James, AverageAmount=600} {Name=Frank, AverageAmount=1000}

Aggregate

The last operator in this set is Aggregate. Take a look at its definition:

 public static T Aggregate<T>(     this IEnumerable<T> source,     Func<T, T, T> func); public static U Aggregate<T, U>(     this IEnumerable<T> source,     U seed,     Func<U, T, U> func); public static V Aggregate<T, U, V>(     this IEnumerable<T> source,     U seed,     Func<U, T, U> func,     Func<U, V> resultSelector);

This operator repeatedly invokes the func function, storing the result in an accumulator. Every step calls the function with the current accumulator value as the first argument, starting from seed, and with the current element within the source sequence as the second argument. At the end of the iteration, the operator returns the final accumulator value.

The only difference between the first two signatures is that the second requires an explicit value for the seed of type U. The first signature uses the first element in the source sequence as the seed and infers the seed type from the source sequence itself. The third signature looks like the second, but it requires a resultSelector predicate to call when extracting the final result.

In Listing 4-37, we use the Aggregate operator to extract the most expensive order for each customer.

Listing 4-37: Customers and their most expensive orders

  var expr =     from   c in customers     join   o in (            from c in customers                from   o in c.Orders                join   p in products                       on o.IdProduct equals p.IdProduct                select new { c.Name, o.IdProduct,                             OrderAmount = o.Quantity * p.Price }            ) on c.Name equals o.Name            into orders     select new { c.Name,                  MaxOrderAmount =                      orders                      .Aggregate((t, s) => t.OrderAmount > s.OrderAmount ?                                           t : s)                      .OrderAmount };

As you can see, the function called by the Aggregate operator compares the OrderAmount property of each order executed by the current customer and accumulates the more expensive one. At the end of each customer aggregation, the accumulator will contain the most expensive order, and its OrderAmount property will be projected into the final result, coupled with the customer Name property. The following is the output from this query:

 {Name=Paolo, MaxOrderAmount=100} {Name=Marco, MaxOrderAmount=600} {Name=James, MaxOrderAmount=600} {Name=Frank, MaxOrderAmount=1000}

In Listing 4-38, you can see another sample of aggregation. This example calculates the total ordered amount for each product.

Listing 4-38: Products and their ordered amounts

  var expr =     from   p in products     join   o in (            from c in customers                from   o in c.Orders                join   p in products                       on o.IdProduct equals p.IdProduct                select new { p.IdProduct, OrderAmount = o.Quantity * p.Price }            ) on p.IdProduct equals o.IdProduct            into orders     select new { p.IdProduct,                  TotalOrderedAmount =                     orders                     .Aggregate(0m, (a, o) => a += o.OrderAmount)};

Here is the output of this query:

 {IdProduct=1, TotalOrderedAmount=130} {IdProduct=2, TotalOrderedAmount=100} {IdProduct=3, TotalOrderedAmount=1200} {IdProduct=4, TotalOrderedAmount=0} {IdProduct=5, TotalOrderedAmount=1000} {IdProduct=6, TotalOrderedAmount=0}

In this second sample, the aggregate function uses an accumulator of Decimal type. It is initialized to zero (seed = 0m) and accumulates the OrderAmount values for every step. The result of this function will also be a Decimal type.

Both of the previous examples could also be defined by invoking the Max or Sum operators, respectively. They are shown in this section to help you learn about the Aggregate operator’s behavior. In general, keep in mind that the Aggregate operator is useful whenever there are no specific aggregation operators available; otherwise, you should use an operator such as Min, Max, Sum, and so on. For instance, consider the example in Listing 4-39.

Listing 4-39: Customers and their most expensive orders paired with the month of execution

  var expr =     from   c in customers     join   o in (            from c in customers                from   o in c.Orders                join   p in products                       on o.IdProduct equals p.IdProduct                select new { c.Name, o.IdProduct, o.Month,                             OrderAmount = o.Quantity * p.Price }            ) on c.Name equals o.Name into orders     select new { c.Name,                  MaxOrder =                      orders                      .Aggregate( new { Amount = 0m, Month = String.Empty },                                  (t, s) => t.Amount > s.OrderAmount                                            ? t                                            : new { Amount = s.OrderAmount,                                                    Month = s.Month })};

The result of Listing 4-39 is shown here:

 {Name=Paolo, MaxOrder={Amount=100, Month=May}} {Name=Marco, MaxOrder={Amount=600, Month=December}} {Name=James, MaxOrder={Amount=600, Month=December}} {Name=Frank, MaxOrder={Amount=1000, Month=July}}

In this example, the Aggregate operator returns a new anonymous type called MaxOrder: it is a tuple composed of the amount and month of the most expensive order made by each customer. The Aggregate operator used here cannot be replaced by any of the other predefined aggregate operators because of its specific behavior and result type.

Note

For further information about anonymous types, refer to Chapter 2, “C# Language Features,” or Chapter 3, “Microsoft Visual Basic 9.0 Language Features.”

The only way to produce a similar result using standard aggregate operators is to call two different aggregators. That would require two source sequence scannings: one to get the max amount and one to get its month. Be sure to pay attention to the seed definition, which declares the resulting anonymous type that will be used by the aggregation function as well.

Generation Operators

When working with data by applying aggregates, arithmetic operations, and mathematical functions, sometimes you need to also iterate over numbers or item collections. For example, think about a query that needs to extract orders placed for a particular set of years, between 2000 and 2007, or a query that needs to repeat the same operation over the same data. The generation operators are useful for operations such as these.

Range

The first operator of this set is Range. It is a simple extension method that yields a set of Integer numbers, selected within a specified range of values, as shown in its signature:

 public static IEnumerable<int> Range(     int start,     int count);

The code in Listing 4-40 illustrates a means to filter orders for the years between 2005 and 2007.

Important

Please note that in the following example, a where condition would be more appropriate because we are iterating orders many times. The example in Listing 4-40 is provided only for demonstration and is not the best solution for the specific query.

Listing 4-40: A set of years generated by the Range operator, used to filter orders

  var expr =     Enumerable.Range(2005, 3)     .SelectMany(x => (from   o in orders                       where  o.Year == x                       select new { o.Year, o.Amount }));

The Range operator can also be used to implement classical mathematical operations such as square, power, factorial, and so on. Listing 4-41 shows an example of using Range and Aggregate to calculate the factorial of a number.

Listing 4-41: A factorial of a number using the Range operator

  static int Factorial(int number) {     return (Enumerable.Range(0, number + 1)             .Aggregate(0, (s, t) => t == 0 ? 1 : s *= t)); }

Repeat

Another generation operator is Repeat, which returns a set of count occurrences of element. When the element is an instance of a reference type, each repetition returns a reference to the same instance, not a copy of it.

 public static IEnumerable<T> Repeat<T>(     T element,     int count);

The Repeat operator is useful for initializing enumerations (using the same element for all instances) or for repeating the same query many times. In Listing 4-42, we repeat the customer name selection two times.

Listing 4-42: The Repeat operator, used to repeat the same query many times

  var expr =     Enumerable.Repeat( (from   c in customers                       select c.Name), 2)     .SelectMany(x => x);

Please note that in this example, Repeat returns a sequence of sequences, formed by two lists of customer names. For this reason, we used SelectMany to get a flat list of names.

Empty

The last of the generation operators is Empty, which is used to create an empty enumeration of a particular type T. This operation can be useful to initialize empty sequences.

 public static IEnumerable<T> Empty<T>();

Listing 4-43 provides an example that uses Empty to fill an empty enumeration of Customer.

Listing 4-43: The Empty operator used to initialize an empty set of customers

  IEnumerable<Customer> customers = Enumerable.Empty<Customer>();

Quantifiers Operators

Imagine that you need to check for the existence of elements within a sequence, based on conditions or selection rules. First you select items with Restriction operators, and then you use aggregate operators such as Count to determine whether any item that verifies the condition exists. There is, however, a set of operators, called quantifiers, specifically used to check for existence conditions over sequences.

Any

The first operator we will describe in this group is the Any method, which evaluates a predicate and returns a Boolean result:

 public static bool Any<T>(     this IEnumerable<T> source,     Func<T, bool> predicate); public static bool Any<T>(     this IEnumerable<T> source);

As you can see from the method’s signatures, the method has an overload that requires only the source sequence, without a predicate. This method returns true when at least one element in the source sequence exists or false if the source sequence is empty. To optimize its execution, Any returns as soon as a result is available. In Listing 4-44, you can see an example that determines whether there is any order of product one (IdProduct == 1) within all the customer orders.

Listing 4-44: The Any operator applied to all customer orders to check orders of IdProduct == 1

  bool result =     (from c in customers          from   o in c.Orders          select o)     .Any(o => o.IdProduct == 1); result = Enumerable.Empty<Order>().Any(o => o.IdProduct == 1);

In this example, the operator evaluates items only until the first order matching the condition (IdProduct == 1) is found. The second example in Listing 4-44 illustrates a trivial example of the Any operator with a false result, using the Empty operator described earlier.

All

When you want to determine whether all of the items of a sequence verify a filtering condition, you can use the All operator. It returns a true result only if the condition is verified by all the elements in the source sequence:

 public static bool All<T>(     this IEnumerable<T> source,     Func<T, bool> predicate);

For instance, in Listing 4-45 we determine whether every order has a positive quantity.

Listing 4-45: The All operator applied to all customer orders to check the quantity

  bool result =     (from c in customers          from o in c.Orders          select o)     .All(o => o.Quantity > 0); result = Enumerable.Empty<Order>().All(o => o.Quantity > 0);

Important

The All predicate applied to an empty sequence will always return true. The internal operator implementation in LINQ to Objects enumerates all the source sequence items. It returns false as soon as an element that does not verify the predicate is found. If the sequence is empty, the predicate is never called and the true value is returned.

Contains

The last quantifier operator is the Contains extension method, which determines whether a source sequence contains a specific item value:

 public static bool Contains<T>(     this IEnumerable<T> source,     T value); public static bool Contains<T>(     this IEnumerable<T> source,     T value,     IEqualityComparer<T> comparer)

In the LINQ to Objects implementation, the method tries to use the Contains method of ICollection<T> if the source sequence implements this interface. In cases when ICollection<T> is not implemented, Contains enumerates all the items in source, comparing each one with the given value of type T and using a custom comparer if provided, the second method overload, or EqualityComparer<T>.Default otherwise.

In Listing 4-46, you can see an example of the Contains method as it is used to check for the existence of a specific order within the collection of orders of a customer.

Listing 4-46: The Contains operator applied to the first customer’s orders

  orderOfProductOne = new Order {Quantity = 3, IdProduct = 1 ,     Shipped = false, Month = "January"}; bool result = customers[0].Orders.Contains(orderOfProductOne);

Because of its behavior, the Contains method invoked in Listing 4-46 returns true only if you use the same instance of Order as the value to compare. Otherwise, you need a custom comparer or a value type semantic for Order type (a reference type that overloads the GetHashCode and Equals methods or a value type, as we have already seen) to look for an equivalent order in the sequence.

Partitioning Operators

Selection and filtering operations sometimes need to be applied only to a subset of the elements of the source sequence. For instance, you might need to extract only the first N elements that verify a condition. You can use the Where and Select operators with the zero-based index argument of their predicate, but this approach is not always useful and intuitive. It is better to have specific operators for these kinds of operations because they are performed quite frequently.

A set of partitioning operators is provided to satisfy these needs. Take and TakeWhile select the first N items or the first items that verify a predicate, respectively. Skip and SkipWhile complement the Take and TakeWhile operators, skipping the first N items or the first items that validate a predicate.

Take

We will start with the Take and TakeWhile family:

 public static IEnumerable<T> Take<T>(     this IEnumerable<T> source,     int count);

The Take operator requires a count argument that represents the number of items to take from the source sequence. Negative values of count determine an empty result; values over the sequence size return the full source sequence. This method is useful for all queries in which you need the top N items. For instance, you could use this method to select the top N customers based on their order amount, as shown in Listing 4-47.

Listing 4-47: The Take operator, applied to extract the two top customers ordered by order amount

  var topTwoCustomers =     (from    c in customers      join    o in (              from c in customers                  from   o in c.Orders                  join   p in products                         on o.IdProduct equals p.IdProduct                  select new { c.Name, OrderAmount = o.Quantity * p.Price }              ) on c.Name equals o.Name              into customersWithOrders      let     TotalAmount = customersWithOrders.Sum(o => o.OrderAmount)      orderby TotalAmount descending      select  new { c.Name, TotalAmount }     ).Take(2);

As you can see, the Take operator clause is quite simple, while the whole query is more articulated. The query contains several of the basic elements and operators we have previously discussed. The let clause, in addition to Take, is the only clause that we have not already seen in action. The let keyword is useful to define an alias for a value or for a variable representing a formula. In this sample, we need to use the sum of all order amounts on a customer basis as a value to project into the resulting anonymous type. At the same time, the same value is used as a sorting condition. Therefore, we defined an alias named TotalAmount to avoid duplicate formulas.

TakeWhile

The TakeWhile operator works like the Take operator, but it checks a formula to extract items instead of using a counter. Here are the method’s signatures:

 public static IEnumerable<T> TakeWhile<T>(     this IEnumerable<T> source,     Func<T, bool> predicate); public static IEnumerable<T> TakeWhile<T>(     this IEnumerable<T> source,     Func<T, int, bool> predicate);

There are two overloads of the method. The first requires a predicate that will be evaluated on each source sequence item. The method enumerates the source sequence and yields items if the predicate is true; it stops the enumeration when the predicate result becomes false, or when the end of the source is reached. The second overload also requires a zero-based index for the predicate to indicate where the query should start evaluating the source sequence.

Imagine that you want to identify your top customers, generating a list that makes up a minimum aggregate amount of orders. The problem looks similar to the one we solved with the Take operator in Listing 4-47, but we do not know how many customers we need to examine. TakeWhile can solve the problem by using a predicate that calculates the aggregate amount and uses that number to stop the enumeration when the target is reached. The resulting query is shown in Listing 4-48.

Listing 4-48: The TakeWhile operator, applied to extract the top customers that form 80 percent of all orders

  // globalAmount is the total amount for all the orders var limitAmount = globalAmount * 0.8m; var aggregated = 0m; var topCustomers =     (from    c in customers      join    o in (              from c in customers                  from   o in c.Orders                  join   p in products                         on o.IdProduct equals p.IdProduct                  select new { c.Name, OrderAmount = o.Quantity * p.Price }              ) on c.Name equals o.Name              into customersWithOrders      let     TotalAmount = customersWithOrders.Sum(o => o.OrderAmount)      orderby TotalAmount descending      select  new { c.Name, TotalAmount }     )     .TakeWhile( X => {                     bool result = aggregated < limitAmount;                     aggregated += X.TotalAmount;                     return result;                 } );

Skip and SkipWhile

The Skip and SkipWhile signatures are very similar to those for Take and TakeWhile:

 public static IEnumerable<T> Skip<T>(     this IEnumerable<T> source,     int count); public static IEnumerable<T> SkipWhile<T>(     this IEnumerable<T> source,     Func<T, bool> predicate); public static IEnumerable<T> SkipWhile<T>(     this IEnumerable<T> source,     Func<T, int, bool> predicate);

As we mentioned previously, these operators complement the Take and TakeWhile couple. In fact, the following code returns the full sequence of customers:

 var result = customers.Take(3).Union(customers.Skip(3)); var result = customers.TakeWhile(p).Union(customers.SkipWhile(p));

The only point of interest is that SkipWhile skips the source sequence items while the predicate evaluates to true and starts yielding items as soon as the predicate result is false, suspending the predicate evaluation on all the remaining items.

Element Operators

Element operators are defined to work with single items of a sequence, to extract a specific element by position or by using a predicate, rather than a default value in case of missing elements.

First

We will start with the First method, which extracts the first element in the sequence by using a predicate or a positional rule:

 public static T First<T>(     this IEnumerable<T> source); public static T First<T>(     this IEnumerable<T> source,     Func<T, bool> predicate);

The first overload returns the first element in the source sequence, and the second overload uses a predicate to identify the first element to return. If there are no elements that verify the predicate or there are no elements at all in the source sequence, the operator will throw an InvalidOperationException error. Listing 4-49 shows an example of the First operator.

Listing 4-49: The First operator, used to select the first American customer

  var item = customers.First(c => c.Country == Countries.USA);

Of course, this example could be defined by using a Where and Take operator. However, the First method better demonstrates the intention of the query, and it also guarantees a single (partial) scan of the source sequence.

FirstOrDefault

If you need to find the first element only if it exists, without any exception in case of failure, you can use the FirstOrDefault method. This method works like First, but if there are no elements that verify the predicate or if the source sequence is empty, it returns a default value:

 public static T FirstOrDefault<T>(     this IEnumerable<T> source); public static T FirstOrDefault<T>(     this IEnumerable<T> source,     Func<T, bool> predicate);

The default returned is default(T) in the case of an empty source, where that default(T) returns null for reference types and nullable types. If no predicate argument is provided, the method returns the first element of the source if it exists. An example is shown in Listing 4-50.

Listing 4-50: Examples of the FirstOrDefault operator syntax

  var item = customers.FirstOrDefault(c => c.City == "Las Vegas"); Console.WriteLine(item == null ? "null" : item.ToString()); // returns null IEnumerable<Customer> emptyCustomers = Enumerable.Empty<Customer>(); item = emptyCustomers.FirstOrDefault(c => c.City == "Las Vegas"); Console.WriteLine(item == null ? "null" : item.ToString()); // returns null

Last and LastOrDefault

The Last and LastOrDefault operators are complements of First and FirstOrDefault. The former have signatures and behaviors that mirror the latter:

 public static T Last<T>(     this IEnumerable<T> source); public static T Last<T>(     this IEnumerable<T> source,     Func<T, bool> predicate); public static T LastOrDefault<T>(     this IEnumerable<T> source); public static T LastOrDefault<T>(     this IEnumerable<T> source,     Func<T, bool> predicate);

These methods work like First and FirstOrDefault. The only difference is that they select the last element in source instead of the first.

Single

Whenever you need to select a specific and unique item from a source sequence, you can use the operators Single or SingleOrDefault:

 public static T Single<T>(     this IEnumerable<T> source); public static T Single<T>(     this IEnumerable<T> source,     Func<T, bool> predicate);

If no predicate is provided, single extracts from the source sequence the first single element. Otherwise, it extracts the single element that verifies the predicate. If there is no predicate and the source sequence contains more than one item, an InvalidOperationException error will be thrown. If there is a predicate and there are no matching elements or there is more than one match in the source, the method will throw an InvalidOperationException error, too. You can see some examples in Listing 4-51.

Listing 4-51: Examples of the Single operator syntax

  // returns Product 1 var item = products.Single(p => p.IdProduct == 1); Console.WriteLine(item == null ? "null" : item.ToString()); // InvalidOperationException item = products.Single(); Console.WriteLine(item == null ? "null" : item.ToString()); // InvalidOperationException IEnumerable<Product> emptyProducts = Enumerable.Empty<Product>(); item = emptyProducts.Single(p => p.IdProduct == 1); Console.WriteLine(item == null ? "null" : item.ToString());

SingleOrDefault

The SingleOrDefault operator provides a default result value in the case of an empty sequence or no matching elements in source. Its signatures are like those for Single:

 public static T SingleOrDefault<T>(     this IEnumerable<T> source); public static T SingleOrDefault<T>(     this IEnumerable<T> source,     Func<T, bool> predicate);

The default value returned by this method is default(T), as in the FirstOrDefault and LastOrDefault extension methods.

Important

The default value is returned only if no elements match the predicate. An InvalidOperationException error is thrown when the source sequence contains more than one matching item.

ElementAt and ElementAtOrDefault

Whenever you need to extract a specific item from a sequence based on its position, you can use the ElementAt or ElementAtOrDefault method:

 public static T ElementAt<T>(     this IEnumerable<T> source,     int index); public static T ElementAtOrDefault<T>(     this IEnumerable<T> source,     int index);

The ElementAt method requires an index argument that represents the position of the element to extract. The index is zero based; therefore, you need to provide a value of 2 to extract the third element. When the value of index is negative or greater than the size of the source sequence, an ArgumentOutOfRangeException error is thrown. The ElementAtOrDefault method differs from ElementAt because it returns a default value-default(T) for reference types and nullable types-in the case of a negative index or an index greater than the size of the source sequence. Listing 4-52 shows some examples of how to use these operators.

Listing 4-52: Examples of the ElementAt and ElementAtOrDefault operator syntax

  // returns Product 2 var item = products.ElementAt(2); Console.WriteLine(item == null ? "null" : item.ToString()); // returns null item = Enumerable.Empty<Product>().ElementAtOrDefault(6); Console.WriteLine(item == null ? "null" : item.ToString()); // returns null item = products.ElementAtOrDefault(6); Console.WriteLine(item == null ? "null" : item.ToString());

DefaultIfEmpty

DefaultIfEmpty returns a default element for an empty sequence:

 public static IEnumerable<T> DefaultIfEmpty<T>(     this IEnumerable<T> source); public static IEnumerable<T> DefaultIfEmpty<T>(     this IEnumerable<T> source,     T defaultValue);

By default, it returns the list of items of a source sequence. In the case of an empty source, it returns a default value that is default(T) in the first overload or defaultValue if you use the second overload of the method.

Defining a specific default value can be helpful in many circumstances. For instance, imagine that you have a public static property named Empty, used to return an empty instance of a Customer:

 public static Customer Empty {     get {         Customer empty = new Customer();         empty.Name = String.Empty;         empty.Country = Countries.Italy;         empty.City = String.Empty;         empty.Orders = (new List<Order>(Enumerable.Empty<Order>())).ToArray();        return(empty);     } }

Sometime this is useful, especially when unit testing code. Another situation is when a query uses GroupJoin to realize a left outer join. The possible resulting NULLs can be replaced by a default value chosen by the query author.

In Listing 4-53, you can see how to use DefaultIfEmpty, eventually with a custom default value such as Customer.Empty.

Listing 4-53: Example of the DefaultIfEmpty operator syntax, both with default(T) and a custom default value

  var expr = customers.DefaultIfEmpty(); var customers = Enumerable.Empty<Customer>(); // Empty array IEnumerable<Customer> customersEmpty =     customers.DefaultIfEmpty(Customer.Empty);

Other Operators

To complete our coverage of LINQ query operators, we describe a few final extension methods in this section.

Concat

The first one is the concatenation operator, named Concat. As its name suggests, it simply appends a sequence to another, as we can see from its signature:

 public static IEnumerable<T> Concat<T>(     this IEnumerable<T> first,     IEnumerable<T> second);

The only requirement for Concat arguments is that they enumerate the same type T. We can use this method to append any IEnumerable<T> sequence to another of the same type. Listing 4-54 shows an example of customer concatenation.

Listing 4-54: The Concat operator, used to concatenate Italian customers with customers from the United States

  var italianCustomers =     from   c in customers     where  c.Country == Countries.Italy     select c; var americanCustomers =     from   c in customers     where  c.Country == Countries.USA     select c; var expr = italianCustomers.Concat(americanCustomers);

SequenceEqual

Another useful operator is the equality operator, which corresponds to the SequenceEqual extension method:

 public static bool SequenceEqual<T>(     this IEnumerable<T> first,     IEnumerable<T> second); public static bool SequenceEqual<T>(     this IEnumerable<T> first,     IEnumerable<T> second,     IEqualityComparer<T> comparer);

This method compares each item in the first sequence with each corresponding item in the second sequence. If the two sequences have exactly the same number of items with equal items in every position, the two sequences are considered equal. Remember the possible issues of reference type semantics in this kind of comparison. You can consider overriding GetHashCode and Equals to drive the result of this operator, or you can use the second method overload, providing a custom implementation of IEqualityComparer<T>.