In addition to creating your change log and change control process, you should create a change management policy documenting the procedures around any change that might occur in your Terminal Server environment.
Implement the policy with all changes—be they as "insignificant" as updating a single DLL or INI file, or as major as a service pack or operating system update.
The policy should provide for the following:
Definitions of development, testing, and production environments.
Lists of who has access to each environment.
Definition of the standard change control cycle.
Procedures for requesting a change.
Service Level Agreements (SLAs) for different types of changes.
Identification of those individuals who can sign off on a change in each environment.
Procedures for emergency changes.
Procedures for logging changes as they progress through the cycle.
These items describe a general change control policy. Your specific applications or hardware might demand a more specific policy. Let's focus on each component of the change management policy.
If you're serious about building a stable Terminal Server environment, you'll ideally create three identical, yet separate, environments:
A development environment to be used for initial testing of changes.
A user acceptance testing environment, to be used for final testing before moving into production.
A production environment that incorporates the changes that have successfully moved through the other environments. These are the servers that users will access on a daily basis.
In some small deployments, building three environments is simply not possible due to the cost of extra servers and licensing. If this describes your situation, you should still build two environments (development and production). If necessary, a workstation-class machine for your development environment is better than nothing.
In the development environment your team is able to test new changes, determine the affects of these on the servers, and test and decide on a deployment strategy for the change. This environment can also be used for simple baselining of new applications and applications upgrades, testing new service packs and hotfixes, and testing new scripts.
In larger environments, developers can also be given access to the development environment in order to develop and test any application upgrades or changes.
As anyone with experience in the industry knows, developers have a nasty habit of "writing and flying," throwing some code out on production servers then going on vacation for three weeks while you troubleshoot problems. (To be fair, developers feel that we systems engineers do nothing but hamper their creativity with all our pesky change management.) By giving the developers an environment in which to run tests, and by limiting their access rights to other areas, you can (theoretically) protect yourself from the "write and fly" problem.
It's good practice to design the development environment to be completely independent from the production servers. The idea is that this "dev" environment is an exact replica of the production environment (only smaller). Any change made in the development environment should affect the development servers in the same way that it would affect the production servers. The servers from both environments should be built using the same image and have the same updates and changes performed on them. The result is a real test environment that can accurately demonstrate how changes will affect production users before promoting them into the production environment.
Proposed changes are generally approved by the necessary people before being promoted into the next change control phase—the user acceptance and testing environment.
In the user acceptance testing (UAT) environment, real end users test the functionality of the applications. These testers should be diverse in their job functions to achieve a good representation of how the applications will used in the production environment. Think of this as an ongoing pilot.
By monitoring the changes in this testing environment, you can find problems that were missed in the cursory testing of the development environment.
As its name implies, the production environment contains the servers that are used for the day-to-day operations of your Terminal Server systems. You should be extremely protective about any changes to this environment. Once a system is running smoothly in the production environment, the last thing you want is to be forced to pull an all-nighter rebuilding servers from some botched application upgrade.
Even if your production environment is home to many applications and configurations, each server is only part of one silo and therefore only has one image that applies to it. Remember that even if you don't use automated imaging software to deploy your servers, all servers that are in the same silo should have the same "image" in the sense that they are all identical. Any change made to one server should be made to all servers. (Of course, any change made to these production servers previously endured exhaustive testing as outlined in this change management process.)
As innocent as the question may sound, deciding who has access to each of the three change management environments is critical. A more accurate phrasing of this question is, "Who needs access to each environment and what level of access do they need?"
For the sake of simplicity, let's divide access to a Terminal Server into two groups: administrator-level access and user-level access. Individuals with administrative-level access can install applications, make configuration changes, and essentially do whatever they want to a server. Those with user-level access have the ability to connect to and run applications on Terminal Servers but cannot install applications or modify permissions.
Figure 15.2 illustrates how companies typically allocate access to their employees.
Administrators, Application Developers, Application packagers
User Acceptance Testing
Production Users, Application Developers, Application packagers
Production Users, Application Developers, Application packagers
This security configuration protects your environment while allowing you to truly replicate the production servers. When a change leaves the development environment, it should be ready to work in production. If not, the change should be rolled back into development for further development. By isolating the developers and packagers in the testing environment, you discourage the "little tweaks" made by the coders that fail to replicate the true change.
Your change control policy will ultimately provide a cycle that repeats as changes are required to your Terminal Servers. Although the exact cycle will vary from company to company, Figure 15.3 outlines the basic flow of the steps.
Figure 15.3: The complete change control cycle
As the flow chart depicts, success points throughout the process ensure that the overall process will be successful. Any problems introduced by the change can be rolled back into development for further testing. Trying to fix a problem in the production environment does nothing more than introduce changes to the system that might not be necessary.
As a Terminal Server administrator, requests for change most likely come to you every day. (Of course, most of these are not official requests and therefore do not get treated as such.)
In an ideal environment, each change request should be documented, providing an audit trail for why each change was made, when it was requested, who requested it, and its impact on the business.
In general, changes will come from application owners or managers within the business, but there will also be change requests from within the IT staff (such as a new service pack or hotfix). In either case, the change should follow the official change management cycle.
As an administrator, it's in your best interest to see only change requests that are based upon legitimate business purposes. (You can't always control this, but that doesn't mean that it's not in your best interest.)
For the most part, completing a change request is a split effort between the IT staff and the person requesting the change. The requestor specifies some basic information about the change and the IT staff fills in the rest of the blanks. (A sample change request is on the following page.)
In this change request (like many), only the basic information about what needs to be changed and why is supplied by the requester. It is up to the IT staff to elaborate about how the change will be implemented and tested. This information can then be given to the IT manager or business managers for approval.
A simple service level agreement (SLA) addressing change requests is required in all environments. An SLA will define the minimum or maximum amount of time it will take a change to be tested and deployed into the production environment. The SLA definition will usually not include the process of requesting or approving the change, rather, it will identify how long it should take to get the change into production. The timeframe associated with a change is usually dependent on the priority level of the change request.
The necessity of defining a change request SLA is appreciated by those who have ever encountered the following scenario:
To: Terminal Server Admin Team
From: Susie Doe (Office Application Owner)
Date: November 26, 2003
Subject: Tier I - Office XP service pack update
Level of Request: Level II
Request Number: 119
Service Pack 1 for Office XP has been released and is currently being rolled out to the desktops. The current version of Office on the Terminal Server farm needs to be updated to this new service pack in keeping with our Office standards.
Potential Impact on Environment
The Service Pack could cause problems with other applications dependent on Office. A back-out plan for this upgrade should be designed. Existing applications that have Office dependencies may not support this upgrade. Vendors should be contacted and applications should be retested prior in the test environment prior to deployment.
The IT staff will complete integration testing for the service pack on one Tier1 server in the test environment. The Packaging team will then test the automated distribution of the package to the Dev and UAT servers. Once the change has been validated and no problems exist in Dev and UAT, the change will be deployed to production. The IT staff will continue to monitor the environment to ensure that no problems arise after this service pack has been applied.
This change will take approximately one week to complete testing and deployment. Required resources to facilitate this effort are one engineer to test and deploy the update.
Review all available technical documentation for the update.
Plan and architect integration of the update into the existing environment.
Document installation procedures. Modify when necessary to stay within best practices for the server farm and supported application environment.
Apply the upgrade to the dev environment on one test Tier 1 server.
Test for connectivity and usability, making sure to re-test existing applications in the environment.
Document all processes.
Coordinate user acceptance testing to functionally test the update in the UAT environment to ensure that no issues exist. Obtain user signoff once complete.
Install update to the production environment during the weekend so as to ensure no user disruption. To ensure redundancy while this is taking place, install the update on one server at a time.
Coordinate a final production signoff to ensure that all functionality still operates properly. Obtain user signoff once complete.
The turnover date for production is estimated for December 2, 2003. Turn over SOP information and documentation to Operations Staff.
Friday 2:45pm: "Hey, Joe. I need you to drop this update on the servers today... Yeah, I promised the staff I would have it updated by this weekend but it has taken me longer to put it together than I thought. We REALLY have to have this. Have a good weekend!"
Of course most peoples' first inclination is to help the guy out. The problem arises when you realize that since this is Friday afternoon, no real testing of the change won't happen before the weekend. Slapping this untested change on the servers could cause you and your users a lot of grief on Monday morning if something else is broken by the change.
To prevent this, having a documented SLA in place could allow you to fall back on it and say, "Sorry, our change control SLA states five days minimum for testing. We can't do anything until we run it all through the process." Saying this might not win you any friends, but at least you won't be working over the weekend.
Each environment serves a different purpose and is used by different people. In order for a change to move between the development, testing, and production environments, someone has to verify that testing was successful in the previous environment before the change is rolled into the next.
If we look at a simple scheme of what goes on in each environment, we can easily correlate who should determine at each stage whether the change was successful and caused no other problems.
The development environment is generally used for testing new code and other changes. Applications are reviewed for functionality and are regression tested. The change is also tested to determine if it will have an effect on performance.
In this environment, it is generally the administration team or an IT manager that signs off on the change. They review the testing results, then authorize that the application be promoted into UAT.
In the UAT environment, current applications and builds are combined with new changes to simulate the production environment. More IT staff testing may happen here, but the idea is that actual users use the system as they would on a daily basis.
Users will report any problems in this testing environment. Their sessions should be monitored closely for both performance-related issues that could negatively impact the production environment and for any problems caused in other applications. Once the users find there are no problems with the system, the change request can be handed-off to the IT manager (or other approver).
Once the change has been implemented in the production environment, it should still be monitored for any problems not found in the first two environments. In theory, proper testing and movement through the change control cycle should prevent a "bad" change from getting into production, however, it does happen and should be something that you're prepared for. (The most frequent cause of bad changes making it into production is when non-identical hardware is used in the production and testing environments.)
In the real world there are always emergencies. Your change control policy and SLAs will have to allow for emergency changes to be implemented. Consider the following two scenarios:
A new Microsoft security bulletin was just issued. It describes a vulnerability of Terminal Servers to denial of service attacks. You realize that you have ten Terminal Servers accessible by the Internet and need to apply this fix right away.
You have been experiencing a consistent blue screen of death on two of your servers. You finally receive the information from Microsoft on what is causing it and how to fix it. This fix requires a driver update in the form of a hotfix which needs to be applied to get your users working again.
In both of these cases the changes need to be implemented as soon as possible. Your users (and management) cannot sit around for a week or two while you test the change. In situations like this, most companies opt to apply the change to the production environment after receiving sign off on the development environment—completely bypassing the testing environment.
Sign off for the implementation of the emergency change usually comes from someone higher up in the company than would be required for a standard change. In most cases, the IT staff requiring the change submits an emergency change control request. The person signing off on the request generally reviews the facts, the possible impact of the change, and the existing problem, and then determines whether an emergency change is best for the business. In some cases, the approver may reject the "emergency" status of the change, instead opting for the standard change control process.
When facing an emergency change, the testing process is usually shortened. Emergency changes are (at minimum) tested in the development environment to ensure that the change installs properly and causes no immediate issues. Then, the "back out" plan is tested (i.e. performing an uninstall of a hotfix). Once these two tests are completed successfully, the testing team can obtain sign off and implement the change in the production environment.
The reality that an emergency change could become necessary at any time underscores the importance maintaining a development environment that is identical to your production environment, both in terms of hardware and software.
Though emergency changes follow their own shortened change process, it's still important that they are documented properly order to maintain an identical build across all three environments and modify your image for future server deployments.