The Next Development Platform | Building Intelligent .NET Applications(c) Agents, Data Mining, Rule-Based Systems, and Speech Processing

< Day Day Up >

The .NET platform was a huge initiative for Microsoft, and the company plans to continue in that direction with a whole line of products focused around the .NET framework. Microsoft hopes to make development easier for us developers. At the same time, it plans to expand on what can be done with the same applications. Table 9.1 lists the main components of what Microsoft calls the "Next Development Platform."

The plan is to release the next generation of Visual Studio.NET (code name Whidbey) by 2005. This version will offer many productivity advantages for developers and will tightly integrate with the next version of SQL Server, SQL Server 2005. This new version of SQL Server represents a major upgrade of the product and includes an overhauled version of Analysis Services.

Analysis Services 2005

On the near horizon, SQL Server 2005 (code name Yukon) offers some wonderful upgrades in the area of Analysis Services. The biggest improvements in Analysis Services 2005 are changes to the interface, the inclusion of five new mining algorithms, and the ability to mine data in "real time."

The basic concepts of data mining remain. You still create an Analysis Services database from either an OLAP cube or a relational database. Mining models are initially built using a wizard and then refined with an editor. Mining models are trained against test data, and developers still need to make sure they are working with clean data. You can utilize Analysis Services 2000 and create some very useable applications even if your company is not ready to move immediately to SQL Server 2005. You can then migrate the application to Analysis Services 2005 when it becomes available.

What has changed is that working with Analysis Services should be quicker and easier. Also, you now have the ability to take advantage of some significant upgrades. Instead of abandoning or neglecting data mining, Microsoft has invested some serious time and effort in improving its capabilities. It has added additional algorithms that expand the types of data-mining solutions developers can create.

This section will present some of the new data-mining features expected with the release of SQL Server 2005. It will also discuss migrating an existing 2000 mining model to the new version.

Note

The information in this section is based on the SQL Server 2005 Beta 2 release version. Some things may change with the final release.

A New Interface

Perhaps the most noticeable difference between Analysis Services 2000 and 2005 is the interface. Database administrators and developers no longer need separate tools to manage SQL Server databases and Analysis Services databases. You can now view both from one tool known as the SQL Server Management Studio.

For developers creating data-mining solutions, projects exist inside a Visual Studio Solution file just like any other project. Referred to as the Business Intelligence (BI) studio, this is quite nice: everything is in one place, and you get access to a familiar and consistent interface.

Like the 2000 version, Analysis Services 2005 includes wizards that allow you to create a data source and a mining model. However, you now have the ability to define a Data Source View (DSV). This is a virtual view of the actual data and can be used to specify computed columns. In Chapter 5, we had to reference a view inside SQL Server 2000 named vw_shipments. With Analysis Services 2005, you can create a virtual column known as a named calculation. This is good for keeping the code used in your data-mining solution separate from the actual database. Instead of having to store special tables or views inside the relational database, you can utilize the data source views.

A DSV is also useful when you have a database that contains hundreds or thousands of tables. The DSV only needs to include the tables you are interested in mining. You can also use it to select data from multiple data sources, such as databases on other servers or even text files.

Analysis Services 2005 comes with twelve views that can be embedded into your Visual Studio .NET application as Windows Forms controls. The views allow you to create and edit mining models and give the developer a different way of visualizing the results.

The new version includes a query editor that resembles SQL Servers Query Analyzer. It also includes another query builder that has a Microsoft Access style interface. Queries are performed using the Multidimensional Extensions (MDX) language, which has been enhanced in Analysis Services 2005. This will enable you to build a mining model and then use the query tools to extract meaningful information from the results. It could be quite useful if you are starting from scratch with a database and do not know what you are trying to predict.

New Mining-Model Algorithms

Analysis Services 2005 features seven data-mining algorithms five more than in the 2000 version. In addition, the original two algorithms, decision trees and clustering, have been updated.

The new algorithms are listed below:

Association rules Used to create a set of rules used in predictions. Most useful in making predictions against large amounts of transactional sales data.
Time series Used to predict trends; can be useful when working with financial data, such as stock prices.
Naive Bayes Used only against noncontinuous variables (for example, a product name) and therefore performs very quickly.
Sequence clustering In addition to grouping similar data into clusters, it uses sequence analysis to determine the order in which events occur.
Neural nets Based on an AI technique, this is useful for determining things like whether a customer is good or bad. It is the most thorough algorithm and therefore the most time consuming.

Mining Data in Real Time

Analysis Services 2005 will continue to support the method of processing data used in the 2000 version. This method is a "pull" method in which data used to process the model is pulled from the data source at the time it is processed. In Chapter 6, data was refreshed on a daily basis. For most situations this is all right, since data mining is generally used to extract meaning from historical data that does not change all that much. Also, mining involves looking for trends in the data and not querying for specific values.

With the new version, you can now use a push method to retrieve data from a Data Transformation Services (DTS) package or a custom application. Another option, in between the two, is to use a proactive cache when you are working with data from an OLAP data source. In this scenario, data is refreshed based on predefined parameters, such as the amount of time between data pulls.

Migrating a Mining Model Created with SQL Server 2000

A migration wizard is included with Analysis Services 2005 that allows you to migrate a mining database created with Analysis Services 2000. You still have to preprocess the mining models once they are migrated, but at least you do not have to recreate them from scratch. The limitations on migrating cubes as of the beta 2 version include not being able to migrate remote partitions and linked cubes. Linked cubes have been replaced with linked measure groups.

Individual mining models can also be copied to Analysis Services 2005 by using a PMML (Predictive Model Markup Language) query. You can then create a mining model in Analysis Services 2005 by using the Create Mining Model statement and referencing the PMML retrieved. This method does not copy the bindings, though, so you will only be able to view the content in Analysis Services 2005 and not be able to reprocess the model. This could be useful, however, if you want a quick way to view the results of an old mining model using the new tools in Analysis Services 2005.

Longhorn

Longhorn is the code name for the next big Microsoft operating system. The final client version of this operating system is not expected to arrive before 2006, but when it does arrive, it should offer computing advances that can be utilized by many intelligent-based applications (the server version is expected to be released sometime in 2007).

The client version will include two pillars known as Avalon and Indigo. Avalon is the presentation layer, and Indigo is the communications layer. The initial release should be followed by the release of a new storage system known as WinFS as a beta version. WinFS represents a significant change in the way data is organized and accessed by applications.

Note

The information in this section is based on the Longhorn alpha release version. Some things will undoubtedly change with the final release.

Avalon

Longhorn will use a new markup language known as Extensible Application Markup Language (XAML). XAML is similar to HTML in that you can control the layout of text and controls on a page. But it also allows you to add procedural code using languages like C#, Visual Basic.NET, and JScript.Net. The procedural code mixed with XAML will function similarly to the current code-behind files used in .NET. You can have a simple application that consists only of XAML or one that consists of both (this is the type that most developers will create).

A significant change for developers is the introduction of the application model. The new model provides a single programming model for creating different types of applications. Developers will be able to create applications that take advantage of the best features of Web- and Windows-based applications. Thus you will be able to create applications that can be deployed easily like current Web applications, but can also run offline like current Windows applications. Both Web and desktop applications will look and function essentially the same. The biggest difference will pertain to where the code resides.

Blurring the lines between application-development methods will eliminate the need for certain tradeoffs, such as having a rich user interface versus an easy deployment scenario. This should create more opportunities for developers to implement enhanced computing techniques.

Indigo

Indigo provides for secure communication among applications and is a key piece of the seamless computing vision. It incorporates Web services and allows you to communicate in one of two ways:

Stateless The less reliable method, representing the way Web services are utilized currently.
Stateful A session exists between the sender and the receiver so that the communication is stable and secure. With sessions you can specify exactly how a message should be received. This is especially important when you have Web services sending information over unreliable networks. For example, even if a connection is interrupted, a stateful session can pick up the communication later when it is available.

Secure and reliable communication is critical for most businesses. It is also important for agent-based applications in which data is available remotely or there is a need to communicate with other agents. Therefore, the ability to improve the way Web services are implemented is an important next step for enhanced computing.

WinFS

Computer storage space is increasing by leaps and bounds every day. A typical user's hard drive is well over 100 GB. Modern-day applications are quickly filling the unused space as more and more files and file types are added to the mix. WinFS, which stands for Windows Future Storage, is a key component that represents the new storage subsystem. It will depart from the typical way of storing data as files inside folders. WinFS will not be included in the initial client release of Longhorn, but it will be incorporated later and represents an important piece of the puzzle.

WinFS will use a query language known as OPath to locate information. It also allows you to relate items and therefore make them more meaningful. Preliminary documentation gives an example in which a sales proposal is related to the salesperson and the fiscal sales quarter it was created in. This relationship could then be used when searching for proposals from a certain quarter.

WinFS Rules

The platform known as WinFS Rules is one of the most interesting parts of WinFS for enhanced computing. With WinFS Rules, you can create rules that enable users to personalize their experience. This platform will enable developers to create their own rule-based or expert-based applications. In this case, the expert will be the user who knows which documents are most important and how the information should be presented to them. Admittedly, the rules will apply only to data stored in the WinFS and not to other areas. But in many cases this may be enough. After all, managing the growing number of files and messages we have is quite a task.

WinFS rules follow a structure similar to the one governing the rules presented in Chapter 7. The rule itself is made up of a condition and a result. The condition is based on a Boolean function that evaluates to either true or false and can contain multiple conditions. Multiple rules can be grouped together into what is known as a rule set. Decision points are used to limit the number of options exposed to the user when defining rules. Otherwise, the user would be completely overwhelmed by the number of potential conditions and results that would make up a rule.

Most Microsoft Outlook users are familiar with the rules used in filtering out junk mail. The interface for WinFS rules is expected to follow a style that resembles the way Outlook creates rules.

WinFS Notification Service

The WinFS Notification service is similar to the FileWatcher class utilized in Chapter 8. It is used to notify you of specific changes to the file system. What is new is that you can set up what are known as long-term subscriptions. These allow you to receive notifications even after an application has been restarted.

Mobility

Mobility is a critical consideration for most businesses today. Workforces are expanding their size and are widely dispersed. Laptops and handhelds are slowly replacing desktops. The business world needs devices that are easy to use and easy to connect with. But devices must still offer complex functionality and be able to deliver critical data as close to real-time as possible. Microsoft recognizes the growing demand for smarter devices and addresses these needs in the next version of Longhorn.

Network awareness is one important need for mobile applications. No longer can applications assume that the user has access to a stable network connection. Longhorn will provide a single API named Network Location API or NLA. Developers can access all network parameters from this location and intelligently determine whether the network is available. If it is not, then this needs to be accounted for with as little user intervention as possible.

Using the NLA, the developer can determine not only whether the network is available but can also whether the user has a high or low bandwidth connection. The application can then make adjustments accordingly. This would be especially important for applications that automatically initiate file transfers.

< Day Day Up >