2.2. Design DecisionsThe business requirements are as follows:
2.2.1. VersionsBased on this information, the first design decision to be made is which version of MOM 2005 to implement. There are two versions : the MOM 2005 edition, which includes all features and scales from the smallest installations to the largest, and the MOM 2005 Workgroup edition. The Workgroup edition has the same core infrastructure as the full edition, but it only supports up to 10 managed computers (agents). It does not include reporting or connectivity features. MOM 2005 Workgroup edition requires that all components, except for the consoles, are installed on a single piece of server hardware. Other features and restrictions of the MOM 2005 Workgroup edition include:
The MOM 2005 Workgroup Edition is intended, and priced, for very small environments that require limited functionality. Because the Leaky Faucet environment has more than 10 servers to monitor, and reporting and redundancy are required, the full MOM 2005 version must be used. 2.2.1.1. Management groupsThe next step is to decide how many management groups are required and determine the number of management servers in each management group. As mentioned in Chapter 1, all management groups have 1 to 10 management servers (maximum), an operations database, managed computers, and Administrator and Operator consoles. All MOM infrastructures start with a single management group, which can be added to if required. This is similiar to AD planning, which starts with a single domain and grows from there. A single management server can manage up to 2,000 agents, and a management group can administer up to 4,000 agents. If you think the math doesn't add up, you're right. The operations database, of which there can only be one per management group, is the limiting factor of the management group, not the management servers. There is more to capacity planning than this (see the "Pre-Installation Configuration Decisions" section later in this chapter). For each category, the maximum amount is as follows:
Since Leaky Faucet only has 50 servers to manage, a single management group will suffice for the production environment. But what about testing? The Leaky Faucet Windows team knows new applications and servers will be introduced into their environment. So, they have to provide a stable production environment, but testing new applications and servers in production is unacceptable. To solve this issue, Leaky Faucet creates a preproduction management group. A preproduction management group is used for two tasks:
Tweaking MOM 2005 (see the "Using MOM 2005" section in Chapter 1) is actually tweaking the management packs. During the tweaking stage, you can enable or disable processing rules and adjust the alert thresholds to get usable information. Managed code (.NET programs) can be written to govern how an agent will respond to an event on a managed computer. You may also need to develop management packs, with all of their components, from scratch. A misconfigured management pack will produce unpredictable and potentially disruptive results. For successful MOM 2005 operations, it is essential to test management pack changes before you introduce them into your production environment. This is explained in Chapter 4. Management packs are very complextweaking, testing, and developing them in production is a bad idea. There are other reasons to have multiple management groups, such as for security boundaries between groups. For example, if a business has multiple independent units, each with its own IT infrastructure and administration, then it would be appropriate to separate management groups along the administrative boundaries. Another reason to have multiple management groups is to separate certain types of data, such as the security event logs, from the rest of the alert, performance, and event data. You may want to do this for capacity or administrative reasons. Scalability is another reason for multiple management groups. Say a company has more than a few hundred managed computers at multiple remote sites, but wants to manage alerts at a central datacenter location. That company can place management groups at each remote site and configure a MOM-to-MOM Product Connector (MMPC) to forward alerting and other information to the central datacenter site for administration. This creates a tiered MOM infrastructure the lower tier is the source management group and the top tier is the destination management group. For more information on integrating multiple management groups into a single reporting structure and integrating MOM to other operations management products, see Chapter 9. Leaky Faucet has only one Windows Active Directory domain in the production environment but will perform all management at the central headquarters location. Leaky Faucet creates two separate management groups, one for production and one for preproduction, with no connectivity between them. 2.2.1.2. Management servers and the operations databaseSince Leaky Faucet only has 50 servers to manage, even if you add the servers that will host MOM 2005 and allow for 100 percent growth, the number of servers is comfortably within the capacity limits of a single management server (2,000 agents). But since the director of IT wants this solution to be highly available, every effort must be made to eliminate single points of failure. Therefore, a second management server will be added to the management group. All agents that are owned by that management group are automatically made aware of the additional management servers . The agents can dynamically failover to a secondary management server if there is a loss of communications with the primary management server. When planning for failover with a large number of agents, the sum of the number of failed-over agents and the number of agents already managed by a single management server cannot exceed 2,000. For example, if you have a management group that manages 3,000 agents and you split the agents evenly between two management servers, there is no capacity left for failover. To allow for continued operations in the event of a failure, you would need to have at least 3 management servers with 1,000 agents each. Leaky Faucet implemented two management servers in the production management group and one in the preproduction management group since high availability is not a requirement. Without the redundancy requirements for preproduction, the cost of an extra server with software licensing cannot be justified. So then, how will Leaky Faucet respond to a single point of failure in the single operations database? High availability for this component can be addressed in three ways.
However, a clustered solution requires that one cluster node is idle while the other node hosts the clustered application. If the first node fails, the second node takes ownership of the application and continues to run it. Leaky Faucet cannot justify the expense of a computer sitting idle in addition to the cost of the external drive enclosure that the two cluster nodes would share. Max, with the support of the director of IT, chooses not to cluster the management group database server, but the database will be on a RAID 10 disk configuration. 2.2.1.3. Additional feature and servicesLeaky Faucet needs to provide services that are not included in the basic MOM 2005 feature set. Both the business unit manager and the CFO want usage and tracking reports for auditing purposes and to have a record of IT service. The CFO wants secure reports and access to them without going through the IT staff. The remote site support staff will access MOM 2005 data across a slow WAN link and through a firewall. They will not perform configuration duties. Max recognizes that a MOM 2005 Reporting Server component must be included in Leaky Faucet's deployment of MOM 2005. Where the operations database supports the monitoring, alerting, and configuration functions of MOM 2005, the Reporting Server database supports the reporting function. MOM 2005 reports are based on historical data and provide a longitudinal view of monitored application performance and the servers the applications run on. A scheduled task runs once a day by default, which transfers the live data from the operations database to the Reporting Server database. The dataflow in the transfer is unidirectional, going from the operations database to the Reporting Server database only. The transfer is accomplished by a SQL Server DTS package . This scheduled task is coordinated with the grooming jobs that delete old information from the operations database. This ensures that no data is removed before it has been transferred over to the Reporting Server database. MOM 2005 reporting uses SQL Server Reporting Services . SQL Server Reporting Services is a separate product from MOM 2005 and requires separate installation. Security can be applied to individual reports or their folders, much like NTFS permissions applied in a file structure hierarchy. This will make the CFO happy and the reporting web site can be SSL secured. For the business unit manager, MOM 2005 reports include service uptime and outage information. These can be mailed directly to her Exchange mailbox. Unlike the full-featured Operator console, the MOM 2005 Web console displays only the information essential to managing your environment. It presents the operational data by alert view, computer view, and event view. The Web console can be used by the remote site support staff or they can use the Operator console. What the remote site support staff sees can be controlled via console scopes. A console scope is an association created by a MOM administrator between a user account and one or more computer groups. It allows users to only see data from computer groups that are associated with their user accounts. The information they have access to will be filtered by the scope assigned to the logged-on user. The Web console is best when there are firewall issues or when working remotely. It is lightweight and all communication occurs over port 1272, so arranging an open port in the firewall should be easy. 2.2.2. Initial MOM 2005 DesignBy this time, Max has made enough design decisions to develop a design diagram, as shown in Figure 2-1. All management servers will be centrally located at headquarters. The Web and Operator consoles will be available for remote access, and the Operator and Administrator consoles will be available locally. Note that when you run MOM 2005 setup for the MOM user interfaces (UIs), both the Operator and Administrator consoles are loaded, so they are always on the same machine. Figure 2-1. High-level diagram of the MOM 2005 infrastructure at Leaky FaucetBased on this design, Leaky Faucet will need to purchase the following:
This basic infrastructure should meet Leaky Faucet's needs for some time and can be easily extended. If a third management group is needed, then a similar planning exercise should be undertaken to develop redundancy and capacity requirements. Connectivity requirements will also have to be developed, as discussed in Chapter 9. The choice of hardware for a MOM 2005 setup should be influenced by three factors: desired performance, desired redundancy, and available funds. Your hardware solution will be unique to your situation but here are some useul guidelines:
Microsoft has well-published minimum hardware standards for each possible type of server in a management group. Figure 2-2 is a screenshot of the MOM 2005 Sizer.xls tool that gives you an idea of the load you can expect and the minimum-size hardware needed. This tool is included in the MOM 2005 Resource Kit and available at http://www.microsoft.com/downloads/details.aspx?FamilyId=93930640-FA0F-48B3-8EB0-86836A1808DF&displaylang=en. However, it is no substitute for testing in your environment. |