Common Uses | Microsoft Content Management Server 2002: A Complete Guide

Web Services associated with a CMS Web Property are typically used for one of two purposes: syndication (conduction) to any platform or aggregation (consumption) from any platform.

However, the sky is the limit. Anything that can be done via the CMS Publishing API (PAPI) in a stand-alone application could theoretically be made available via a Web Service. So, for instance, if we wanted to create a Web Service that would provide the name and GUID of all postings that expire in the next week, we could do that from any application that could call our Web Service. Further, if we wanted to create a Web Service that allowed our customer to change the ExpiryDate of one of those postings, we could do that. So, common use may lie in the eye of the beholder.

That said, we are going to focus on the two purposes mentioned at the beginning of this segment. From a technical perspective, these two scenarios are distinguished by whether the CMS system will be the publisher or the consumer of Web Services. If it is syndicating its data, it will be the publisher, and if it is aggregating, it will be the consumer.

Content Syndication

Syndication can be defined as a distribution of information to various destinations for repurposing or republishing in another context. Noncomputer examples of syndication include the sale of a comic strip, column, television series, or movie for simultaneous republication in news papers, periodicals, independent television stations, or theaters. The content typically remains the same, but the advertising surrounding the content, the placement or presentation of the content, or even the purpose of the content may be altered.

Syndication is accomplished by replicating data across systems, commonly done as a batch process running at a fixed time interval. This causes duplication of data across systems and creates a potential disconnect between the data being used by the destination applications and the up-to-date data held by the source system. This solution can be complex to manage but is often required to meet the business need.

Syndication can also be accomplished by creating a real-time interaction between applications. This can lead to fresh reuse of content without resorting to "screen scraping" or manual cutting and pasting. Usage of the syndicated content may create potential subscription fee revenue.

Far more scenarios exist (see the Decisions That Must Be Made section), but whatever method is employed, CMS syndicated content is typically only available when it has been approved for publishing.

Content Aggregation

Aggregation is the flip side of syndication. It can be defined as a collection of related information from various sources for repurposing or republishing in another context. Today aggregators typically gather syndicated content into an indexed, searchable, and potentially categorized collection of links to that content. That content can be displayed in a homogeneous user interface with all relevant data presented simultaneously. This allows a user to visit a single site and interact with content from many sites. Since large volumes of data can be automatically collected, the user can filter and search for just the content that they want to view. The information can be summarized or repurposed, as previously described. Also, since the aggregator collects new content continuously, the site can stay as fresh as its freshest syndicate. For large, effective aggregators, that can mean a Web site that changes very rapidly.

Decisions That Must Be Made

When content is to be syndicated, there are several decisions that must be made in order to determine the best means of sharing that content. These decisions can have a significant impact on the success of the syndication. These are not necessarily presented in any order, since the importance of each decision is dependent on the situation. What is a good choice in one situation may be a poor choice in another situation.

Redundant versus Centralized

It will be important to decide whether the content that is syndicated will be copied to the aggregator or kept in a central store. Making a redundant copy of the data requires a repository and space at the aggregator. If the content changes or is removed, the change may need to be reflected in the copy. This can become quite complex if there are multiple aggregators. That could certainly lead to synchronization issues and potentially lead to stale or even inaccurate content. If the content doesn't change (such as a comic strip), this is less of an issue.

There is also the potential for alternative utility of the aggregated data. Once data is gathered together from various sources, it can become quite compelling. For instance, there is a certain utility for our credit card holder to know when we've never made a late payment. But when all our creditors aggregate their individual payment experiences into a credit bureau, that collective information has a utility that wasn't available to the credit card company before it was centralized.

However, there is a compelling argument for having a single source of truth that all aggregators simply point to. Practically every Web site has links to other content. If every Web site were required to keep a copy of the information that they currently only point to, practically every Web site would become unmanageable within one day. Also, if the content needs a high level of security, spreading copies of it around will certainly increase the risk that it may be exposed. Using a single source will typically provide a more consistent set of information, especially if there are complex or proprietary algorithms that must be performed.

Scheduled versus Real Time

Almost all Web Properties have peak times when their hardware is pushed to its limit and other times when it is nearly idle. God willing, we spend more time at peak than at idle. However, it may be necessary to have information scheduled to be copied locally at times when the servers are not busy so that the content is available locally when the servers are busy. Sometimes, performance is key, especially if the data needed is voluminous. Imagine if we wrote a Web Service to retrieve all products sold at the local grocery in the last quarter so that we could show the top ten selling products to our customer. If someone is waiting at the other end of that query, they may want to go get a cup of coffee or have lunch. Getting that content the night before and interacting with local, potentially even summarized data would significantly improve the experience. Trend analysis or other complex calculations that require the context of other content are good candidates for scheduled syndication.

However, sometimes, regardless of the performance hit, real-time, up-to-date information is key. Imagine if our stockbroker used last night's price for the stock we wanted to buy today, or if our bookie used yesterday's odds for the race we want to bet on today. Not good. These situations require the freshest information possible. However, they need good designs to prevent latency or, worse, unavailability. How many bets do you think the bookie will take if ten minutes before race time he loses access to the odds? We also might want to use real-time access if we only care about a small unanticipatable portion of a large data set. If our customer only looks at their balance once every six months, it would probably be overkill to schedule the move of every customer balance every 20 minutes so that it would be current when they asked. A real-time lookup on a central repository would likely be the best solution in that case.

Push versus Pull

Whether the content will be sent to the syndicate or retrieved by the aggregator can potentially be one of the most difficult decisions that must be made. Politics, turf, security, clout, relationship, profit, trust, the number of potential consumers, who holds the biggest stick, who had the idea, technical savvy, how often the content could change, and a whole host of other factors (including all the other factors discussed in this section) can influence this decision. If the content owner wants to control precisely what content is going to be syndicated, to whom, and when that content should be available, they will probably opt to push their content. The more complex those rules get, the more likely it will be pushed. The push will typically be synchronous with an event at the syndicate, and the aggregator will have little control over the timing of the receipt of that content. However, the aggregator will probably store that information redundantly for instant availability when it is requested by their customer, and the information will probably be as fresh as it can be.

On the other hand, if the interaction needs to be at the behest of the consumer, a pull model will need to be created. It wouldn't make any sense for a company to push their stock price out to every partner every time it changed. That content would most likely be pulled by the partner when it was needed. It also wouldn't make sense for a bank to push every transactional change out to all their branches in anticipation that one of their customers may walk into that branch. The branch would selectively pull that content when they needed it.

Private versus Open Access

The decision about who will be allowed access to this content will typically be far more cut and dried. If we need to control who gains access to the content, it will need to be private. It may even be fee based (that is the compelling promise of Web Services for some industries). If Gartner (example of private content) started giving away its analysis to the public, their current revenue model would have to change. Likewise, if Google (example of open access content) started charging for each search, their current revenue model would have to change. In fact, neither would likely survive the change.

RSS versus Web Service

RSS (Really Simple Syndication, created by Netscape and championed by UserLand, or RDF Site Summary, based upon a Web standard for metadata called RDF) is an XML-based standard format that allows the syndication of lists of hyperlinks along with other information, or metadata, that helps viewers decide whether they want to follow the link. Any list-oriented content such as news headlines, press releases, job listings, conference calendars, and rankings (like top ten lists), to list a few is a good candidate for an RSS feed. Powerful tools exist for consuming aggregated RSS feeds, but most require an existing aggregator. Although there are thousands of RSS feeds on a wide variety of topics, there must be a groundswell of public interest in syndicating the same type of content before aggregating it makes sense. So, there is wide adoption in some sectors, while there is no adoption in others. We'd be more likely to find a deep and fresh channel about a rock star than we would about airline ticket prices.

With Web Service syndication, anything is possible. That can actually make it more difficult to get a groundswell of people sharing content about the same thing. Although Web Services are based upon standards, their implementation is typically very proprietary. There are growing means by which aggregators can consolidate content from Web Services, but public adoption is more difficult because there isn't the same kind of strict standards with regard to content organization as there is with RSS. However, partner (B2B) adoption is high for proprietary solutions.

Partial versus Complete

This decision will likely be determined based upon the size of the content universe and the need for the aggregator to have the complete set of content available. As with push versus pull, a large number of external influencers could make this decision difficult. However, typically it will make sense to provide access either to a single item or segment of the content or to all of it. Certainly, our bank isn't going to allow us to access all the accounts when we pull transactional content into our accounting software. It may be all of our content, but it isn't all of the bank's content. If the data is relatively volatile, syndicating only the altered content would be impractical. It would probably make more sense to send the entire universe of data to a redundant store each evening. However, direct, real-time access to specific content could be the best way to handle content that tends to be highly volatile. If we take a cue from RSS, it may be good to syndicate pointers, URLs, or IDs to the actual content. The aggregator would then funnel the user through to the source if their interest was piqued with the teaser.

Add versus Add/Change/Delete

Like the examples given at the beginning of this discussion, some content never changes. Consider a comic strip once it is syndicated, changes do not occur. The same is true of a newspaper column, press release, transcript, school report card, lab test results, and lots of other content. Content that doesn't change is obviously easier to syndicate. Once it is out, it needs no maintenance. It can live in as many repositories as it finds its way into because the source system never needs to alter it. This is by far the most prevalent kind of syndication in place today.

However, for other content, change or removal is likely or other maintenance may be necessary. If the syndicate must be privy to the whereabouts of its syndicated content at all times so that they can keep it current, they will likely need an add/change/delete strategy. In essence, transactional syndication will probably require an approach that allows the syndicate to maintain any redundant copies with methods that allow alteration and deletion of previously syndicated content.

Horizontal or Vertical (Who Cares?)

Much ado is frequently made about whether a Web Service is horizontal or vertical to CMS. The evaluation is somewhat subjective and there isn't a significant benefit to knowing whether a Web Service is horizontal or vertical. Nevertheless, two questions need to be answered.

Is the Web Service interface generic enough to be used for any CMS application or not?
Does the consumer need intimate knowledge of the underlying CMS application or not?

Basically, if the interface is so specific that the consumer doesn't need to know much about your application, it is considered a vertical Web Service (exaggerated example of a vertical WebMethod: GetYahoo PressReleasesForYear(string yearToRetrieve)). But if the interface is very generic, requiring the consumer to have an intimate knowledge of your CMS application information architecture, it is considered a horizontal Web Service because it is similar to PAPI in its style (exaggerated example of a horizontal WebMethod: GetPostings(string postingChannel Path, string postingClient, string postingType, string postingYear, string optionalPostingID)). Figure 33-1 attempts to illustrate these classifications pictorially.

Figure 33-1. Horizontal or vertical Web Service

graphics/33fig01.gif

Web Service Security

Can I secure my Web Service? This is controversial at best, and many people do not think enough attention has been given to this topic. Web Services give anyone with access to the Internet the potentially published (UDDI, WSIL) and certainly discoverable (WSDL) detailed information necessary to invoke code on our publicly accessable servers. It stands to reason that we may want to restrict that access to specific people. We may even want to charge for it. We certainly want to minimize the likelihood that malicious attacker(s) could wreak havoc or worse on those servers.

Almost everything about this topic is well beyond the scope of this chapter. Suffice it to say that in the Microsoft environment that CMS runs in, all the customary means available to secure a traditional Web site are available to secure a Web Service. These include, but are not limited to, the following:

IIS virtual directory based security using Windows: Basic, Digest, or Integrated Windows authentication (NTLM or Kerberos) or restricting by IP address.
.NET Framework based security using Windows: Basic, Digest, or Integrated Windows authentication (NTLM or Kerberos) in conjunction with IIS, Microsoft Passport authentication, Forms, or Client Certificates.
Third-party digital certificate based security.
Windows file system based security using role- or user-based Access Control Lists (ACLs), which implies the use of Windows authentication.
SQL Server role- and user-based security. (Note that CMS uses a specific database user to access the CMS database, and the CMS database structure is internal to Microsoft and not documented for public consumption or alteration. Direct access to the CMS data structures is not supported by Microsoft.) However, it is possible to keep highly restricted data in a separate database and control access using built-in SQL Server security on that data store.

Of course, we could (and many do) build our own proprietary security methods, storing the credentials of valid users in LDAP, a database, or the file system and managing access to Web Services using code written specifically for this purpose.

Other means of protecting access to your assets could include the following:

Encryption: .NET provides strong encryption algorithms, or you can simply encrypt the entire communication via SSL.
Obfuscation: Don't publish, describe, or name your Web Service anything that would help a user know what to expect from it. Remove the WSDL, and require a private object structure that only your partners know. Use an IP address with no domain attached to it.
Restrict physical access: Configure hardware appliances such as a firewall, router, or NIC card to only allow access under certain conditions. Keep the servers under lock and key and out of public areas.
Monitor: Implement an Intrusion Detection System (IDS) that can write rules to keep out known-to-be-bad packet requests (remember NIMDA) and to watch for patterns of abuse and automatically block Denial Of Service (DOS)-like attacks on any element of a Web Property, including a Web Service.

These are but a few of the techniques in the arsenal of security mechanisms that could be deployed to protect our Web Service.

For the code sample in this chapter, we will use IIS Integrated Windows authentication, populating the SoapHttpClientProtocol.Credentials property with the value stored in CredentialCache.DefaultCredentials for the currently logged-in user. This way we can be sure that security is not a hindrance to learning about the topic at hand. Since we are running these examples on a single box, the credentials for one system can easily be shared by the other system. We will, however, follow the best practice of creating a Web Service in a separate Web application from CMS so that different security settings can be applied.

Of course, some Web Services will need to be offered to the anonymous public, and that is possible too. CMS will need guest access enabled, the guest user will need the authority to do whatever the Web Service requires, and IIS will need anonymous access enabled.