Dealing with Distributed Environments | Expert C# 2008 Business Objects

Whether we're generating a report based on thousands or millions of rows of data, or we're running a batch process that affects or analyzes billions of rows of data, distributed environments are problematic . This is true whether we're using datacentric or object-oriented application architectures.

Mainframe- and minicomputer-based applications have always excelled, because the application generating the report or doing the batch process is on the same machine as the data. Of course, it doesn't hurt that both mainframe and minicomputer operating systems come with powerful, queued processing engines, so reports and batch processing can be run in the background, at scheduled times, and so forth.

Conversely, PC- and web-based computing have always been very weak, primarily because of their distributed nature. Moving those thousands or millions of rows across the network from the database server to a client machine, application server, or web server is incredibly expensive. It almost always results in very inferior results when compared to centralized computing systems. It doesn't help that neither PC nor web environments include queued processing engines, so we almost always resort to the user launching these tasks manually, or trying to hack something together by using the Windows AT command, the Windows Task Scheduler or SQL Server's job launcher.

The keys to success are twofold. First, we need to avoid moving massive amounts of data as much as possible. Second, we need some mechanism by which report-generation and batch-processing tasks can be run automaticallyprobably from a queue-processing engine of some sort .

Avoiding Data Transfer

In most environments, the database server is already a bottleneck, so running additional processing on that machine is often taboo. Yet that is exactly the ideal solution when we're dealing with large volumes of data. If we can avoid transferring the data across the network, we can come close to the power of the mainframe/minicomputer environments.

More likely, however, we'll have to move the data across the network at least once to get it onto a machine where we can process the information without tying up yet more database server resources. Of course, the simple act of copying thousands or millions of rows of data across the network is problematic in terms of performance, regardless of the actual processing activity we're performing.

The point here is that we have a classic trade-off between using database server resources and taking a performance hit by moving the data across the network. If you can run the processing on the database server, you can gain some serious benefits; otherwise , you'll have to pay the data-transfer cost.

The most important point here is that the data shouldn't be transferred any further than absolutely necessary, and even then only the minimum required data should be transferred. Ideally, our report generation and batch processing will occur on a server machine that's physically near the database server, and which is connected by the highest network-connection speed possible. The worst thing we can do is transfer all that data out to some client workstation, across who knows how much network infrastructure.

Providing Background Processing

Another significant issue when trying to do report generation and batch processing on a server machine is that we don't have any standard mechanism by which to launch or control such processing. The Windows and .NET environments don't include the kind of queue-processing engines that you'd find on a mainframe or minicomputer.

So, what do we have? Well, there's the Windows AT command, which provides some primitive abilities to schedule tasks for processing. And then there's SQL Server's job scheduler, which many people use (or misuse) for this purpose. There's MSMQ, but that has no provisions for scheduling jobsjust for queuing and running them as fast as possible. None of these solve all the following core issues we face:

Allowing the user to request and/or schedule a task on a server
Allowing a user to remove a pending task from a server
Monitoring and viewing the scheduled, pending, and active tasks on a server
Hosting (running) the task in a process on the server

There are a variety of freeware, shareware, and commercial batch-processing engines available for the Windows environment. However, most organizations that require this kind of functionality end up creating their own batch-processing mechanismand following that trend, we'll be creating a basic one for .NET later in this chapter.

Tip	I recommend looking at commercial alternativescreating a comprehensive batch-processing engine isn't trivial. The one we'll create in this chapter provides minimum functionality. If you need something more comprehensive, it could be more cost effective to purchase a commercial product.