Scalability Requirements

                 

 
Special Edition Using Microsoft SharePoint Portal Server
By Robert  Ferguson

Table of Contents
Chapter  22.   Example Scenario 3 ”Enterprise-Wide Solution


Given the nature of an enterprise deployment, and the variety of roles that individual solution components may play ”for example, dedicated search servers, crawling servers, and so on ”the lack of scalability of each component may quickly become a limiting factor should the scope of the solution change. Such changes might include supporting new business groups, adding loads of new users across the enterprise, configuring tens of thousands of additional crawl sites, or adding specialized capabilities like search and collaboration. Scalability is all about changing the functions or scope of the solution, adding perhaps incremental hardware or software resources, and still maintaining acceptable performance. Scalability is about not having to toss out the current Portal and force a redesign from the ground up after requirements change.

In Global's case, planning for scalability means taking into account a bit of scope-creep as well as planning for post-pilot growth, both in the number of users and scope of the pilot. Thus, servers that can be scaled up in terms of disk storage, RAM, number of network cards, and number of processors makes a lot of sense for Global.

What Scalability Is Not

What is really important to note is that scalability does not correlate to spikes in activity resulting from periods of peak activity. For example, month-end close should not exercise the scalability of the SharePoint Portal Server solution. Month-end close will only tax the sizing of the solution, proving that the system was either adequately sized or not. Remember, the sizing process seeks to determine the peak load that must be addressed by the Portal.

Configuring a Microsoft SharePoint Portal Server solution for scalability only starts with the sizing process. Once a server configuration (including the number and type of servers, and the RAM, CPU, network card, disk controller, and other hardware resources required for each server) capable of addressing the business needs is designed, the configuration must then be looked at with an eye toward redesigning it for appropriate levels of scalability, high availability, manageability, and more.

Planning for Scalability at Global Corporation

Global Corporation's SharePoint Portal Server pilot project sizing exercise identified the need for three crawling servers to cover North America. But once reviewed for scalability, it became clear that four dedicated crawling servers would provide not only the capability needed immediately, but also the headroom ”the scalability ”to allow unanticipated true growth in core business requirements. This approach also coincidentally addressed Global's needs for high-availability, via the commonly employed "n+1" approach. That is, losing one of the North America crawling servers would not actually impact the Portal's abilities to crawl the expected number of sites ”from our sizing exercise, we know that only three crawl servers are actually required.

Rolling Upgrades and Maintenance Cycles

The n+1 approach illustrated previously also afforded Global another valuable benefit ”the ability to bring each redundant server offline (one at a time) so as to perform system updates, OS upgrades, and so on. This capability allows for future planned maintenance or upgrades, usually described as a "rolling" maintenance cycle or "rolling" upgrade. And the n+1 tactic does so without impacting the availability and capability of the Portal solution overall. Planning for scalability truly lends itself to solving these other business problems!

Similar tactics can be employed in regard to other SharePoint Portal Server technical limitations or business needs. The fact that the number of indexes that may be propagated is effectively limited to four per dedicated search-server may be handled in a similar manner as the previous ”Global can plan to implement an additional search server, and reap the benefits of rolling upgrades here too. In the same manner, the practical limitation of 15 workspaces per server also lends itself to this methodology ”plan on implementing an additional server, and should unexpected downtime occur, a duplicate of the workspaces on the down server may be hosted across the remaining available servers.

Stress-Testing for Proof-of-Concept

Even with all of the support and expertise brought to bear on their pilot, Global's SPS project steering committee still had questions about the raw performance of the solution. Risks were weighed against costs in terms of pilot failure, and another $50,000 was budgeted for a comprehensive customized stress test (also often referred to as a load test). The goal was simple ”to measure the performance impact that would be typical of what the production system would bear during a peak period of activity.

Like any company that endured the sizing and characterization process and accurately completed the various sizing questionnaires, Global was able to leverage this data to identify the number of end users and types of transactions or activities that each would be performing during a peak period. At that point, it was believed that it was only a matter of scripting these business transactions, and then "playing" them back (and measuring performance) on the system that would eventually become the production environment. The following considerations quickly became apparent:

  • A scripting tool would be required to script the business activities and other transactions.

  • One or more technical specialists would be needed to head up and complete the stress test.

  • Business and other functional specialists would need to be made available to the aforementioned technical specialists, to ensure that the business activity was scripted realistically .

  • A method would need to be found to execute the scripts so as to represent 1,000 actual end users.

  • Sample data and other resources would need to be made available to support the scripted activities, including documents to be managed, workspaces created, category structures created, Web sites made available to support crawling/searching, and so on.

  • An approach to monitoring the performance of the end-to-SPS solution would need to be drafted and utilized to measure critical performance criteria like online response times, average crawl times, overall disk performance, and more.

After nearly a two-day meeting between the Senior Solutions Architect and his Team Leads, it was determined that the Server Team Lead would manage the stress testing engagement as well. With his background in lab-based server and disk subsystem analysis and testing tools, it was determined that he was a good fit for driving the stress test to completion. He immediately turned to his hardware vendor and Microsoft, to put together a short list of tools and approaches that addressed all of the considerations outlined previously. In the end, he spent half of his budget on a partner/consultant experienced with toolsets capable of supporting virtual Web users, and spent another $10,000 on temporary licenses for the testing tool.

After two weeks of working with the business folks and developing initial scripts, and another two weeks fine-tuning the scripts for virtual user support (as opposed to requiring 1,000 physical desktops on which to run the various tests), Global was ready to start executing test runs against the soon-to-be production SPS pilot environment. The system was tuned between runs, and by the end of the second day of testing, it was clear that the production system was undersized for peak loads. Additional RAM was added, as were additional disks, and the disk subsystems supporting the DMS were tuned throughout the solution stack. Another round of tests then demonstrated that the system was CPU-bound when performing searches, and another pair of processors were added to the search servers.

After this, things looked really good overall, and the steering committee was notified of this fact. They were also made privy to the following lessons learned, shared throughout the entire project team:

  • The amount and variety of data impacted overall performance substantially. That is, with very little data loaded, most everything ran out of the disk controller's cache and therefore artificially inflated the performance numbers . Only when a more realistic amount of data was loaded did it become apparent that the disk subsystem was in need of both an upgrade and incremental tuning.

  • Ramping up the 1,000 virtual end users over a 15 minute period was time-consuming but absolutely critical. This required creating a random number generator based on the unique machine name of each individual virtual user, and then staggering when each virtual user would log in to the SPS system based on this random number. Otherwise, all users would attempt to log in at about the same time, and crater the system.

  • The Stress Test environment used for the testing was subject to changes and modifications throughout the month of actual script development and testing. As such, the scripts themselves had to be continually fine-tuned and adjusted, just to simply continue to work a week after they were created. In the future, the stress test system would be "locked down" for the duration of testing.

  • Scope Creep nearly became a problem, as the business and other functional users continued to add more business processes and activities to the agreed-upon list of activities to be scripted. The Senior Solutions Architect and Project Manager addressed this early on, and froze changes in this regard by the end of the first week of script development.

  • Finally, not all scripting tools (nor the companies that write these often full-featured and therefore potentially complex testing suites) are created equal. Tools should be selected based not only on price, but also in ease of access to support people, should assistance be needed at the last minute. This is especially true of testing tools that support virtual users, for it is much more difficult to troubleshoot these types of scenarios. Plus, it must be noted that a lot of horsepower is required to actually emulate 1,000 clients ”Global initially brought in two quad-CPU/2.5GB RAM servers, each to service 500 virtual users, but actually wound up adding another two servers in this regard.

As we can see, stress testing is by no means a trivial task. It is complex, time-consuming, and subject to scope creep like any other project. But it is also an insurance policy, and as such represents an excellent investment when the cost of not meeting your SharePoint Portal Server performance or scalability requirements outweighs the cost of the stress test itself.


                 
Top


Special Edition Using Microsoft SharePoint Portal Server
Special Edition Using Microsoft SharePoint Portal Server
ISBN: 0789725703
EAN: 2147483647
Year: 2002
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net