Mobility | Buliding N1 Grid Solutions Preparing, Architecting, and Implementing Service-Centric Data Centers

This section extends the discussion of virtualization and provisioning in the previous sections to include mobility. Mobility reflects the ability or the ease in which a particular service component is moved around in the data center. The N1 Grid software simplifies the operational aspects of mobility by supporting flexibility in the stack or data center where mobility is enabled. How will you use the ability to provision or remove a section of the stack? In which layer of the stack is mobility most important to you?

It is important to examine three roles in defining what mobility means for your environment:

A thing that is deployable and mobile
A place that hosts deployable things
Something that might need to know if something is deployed, moved, or removed

For the first role, the choices involve those layers in the provider-consumer stack. Depending on the architecture and products in use, each choice can be mobile, and each choice needs to be addressed or interfaced differently.

Storage can be mobile. Whether you are using something crude like dgimport or dgexport or a more sophisticated NAS or SAN strategy, pools of storage are regularly connected to and moved between sets of servers.

With the N1 Grid software managing network components, servers can be mobile wired once and used many times and their position in the network and security domains determined by the setting on the switches to which they are connected. Initially, they can be in the DMZ VLAN, acting as a web mail proxy. After reprovisioning, the configuration could place them deep in the protected data center service VLAN to add back-end mail store capacity.

Server partitions can be mobile. Dynamic reconfiguration, Solaris™ 10 Zones, and IP multipathing capabilities blur the physical boundaries that servers typically must obey. Operating systems can be mobile within servers. Disk images containing operating systems can be cataloged, selected when desired, and moved between servers, as mapped for deployment.

Middleware also enables mobility. Whether a cluster between multiple machines is used to host a resource group that is allowed to fail over between servers, or a Java™ VM that receives Enterprise JavaBeans and handles stateful failover, a layer of virtualization can exist between the operating system and the application that is running. This is a case where N1 Grid software provisions a service component that then handles stateful failover (a different type of service mobility) in its own way.

As demonstrated in an earlier table, applications can be mobile by using image-based capabilities or N1 Grid Service Provisioning System (N1 Grid SPS) capabilities. Whole applications or groups of applications that make up an end-to-end business service can be installed, removed, or moved between running environments that can host them, so entire business services can be mobile.

The entities that host deployable and mobile entities require special consideration. Some of these considerations were discussed in the virtualization section:

The need to measure and broadcast service level capabilities for different types of loads (for example, knowing that a particular container can host 600 of the required 8000 web connections per second)
The ability to securely discover those capabilities through broadcast, look up, or other means
How or whether to handle both N1 Grid and non-N1 Grid technology controlled networks in a single switch
The licensing, firmware, and tuning considerations that are often load dependent (that is, the mobile entity will provide the load)

However, you must also consider the need to address policies and business drivers to enable hosting to occur, and the issues of density and backout. Who says that something can be moved? Who says that it should be put in one hosting entity versus another? If there are multiple entities that want the same hosting entity, who wins? If a resource constraint issue occurs, who gets moved off to leave enough room for the other entities to run?

Finally, it is necessary to define a structure to use to consider other entities that might want to know when something moves. These entities might include:

Other services that might depend on the mobile entity (for instance, a legacy provisioning server that needs to know where the mobile LDAP master is located)
Assurances that common services such as DNS, NTP, and licensing, as well as authentication, authorization, and observability mechanisms will be able to follow the mobile entity
Security domains, policies, and requirements that provide data for dependency checking
Performance and tuning considerations that provide data for dependency checking

Mobility Business Decisions and Control Fabric

Mobility extends the single instance of provisioning and describes an environment in which the provisionable entity can be installed, moved, and removed multiple times. In this extended environment, the concepts discussed in the provisioning section must be extended to define operational processes that leverage identity and entitlement and the application or service policies to control who is allowed to move services and where those services can be moved. In addition, a verification capability is needed to ensure that the entities have been successfully provisioned and configured, based on either a baseline or the last known state. This capability can also be used for periodic assessment or compliance checking or as a precursor to change. In this way, the mobility of components can be a part of a change management process that takes into account the business motivations, dependencies for the removal, the movement of components, and even the prioritization of those movements in the event that multiple or simultaneous movements are required. The types of allowed activity should be captured in detailed use cases that define the types of dependency checking, rollback, and success criteria to be fully developed for each substep in the mobility use cases.

Foundation Services

The discussion of mobility extends the infrastructure discussion in the application provisioning section to include considerations such as application removal and multiple instances of provisioning. Foundation services need to support mobility of service components by providing the means for these mobile services to maintain connectivity to the important support functionality they provide. For many services (for example, directory, DNS, and web server), this is accomplished by publishing a virtual interface that responds to requests for the service, which often fronts a load-balanced or distributed set of components.

Agent-based instrumentation or other functionality that might have license and host ID constraints require additional pre-move or post-move activities to smoothly enable foundation services to follow the moving services with the policies and checks in place to prevent security and data model policy violations as a result of the move. Implementing the move, add, or remove use cases can include important substep testing for the presence of these foundation services. If you are going to virtualize your environment, you probably will need to work out:

Naming conventions for resources to be virtualized to enable mobility
Licensing
Separation of applications from their compute, storage, and network resources
Rules for contention
Rules and policies for coexistence, compartmentalization, and segmentation (for example, no web servers on important database back ends)
Understanding of the requirements for a particular layer of the service stack
Measurements of the service capability of a particular resource in the service stack with a means to match consumers to providers of the service they require

Preparation for Increased Service Density

This section extends many of the mobility concepts discussed in this chapter to include consideration for increased service density in a given operating system, network switch, or storage resources. Part 3 presents an architectural path that leverages N1 Grid, Consolidation, and Solaris™ 10 Zones to solve a common business problem: reducing cost and complexity.

Business Drivers and Feedback Loop

Permission and conflict resolution become more complex with higher density. It will be necessary to create business processes and policies that define how to resolve issues when capacity or performance incidents arise. An example would be a rule to determine which service is the first of the multiple running services to get removed in the event of a conflict or performance, security, or other incident. You must determine and record who is allowed to remove that service and who resolves disputes like simultaneous demands for capacity between two services with the same criticality. Mobility can require the creation of a new set of roles to handle IT governance.

One method of resolving these mobility issues is to consider the current frequency of change for each component or layer in your environment. A different layer might be more mobile for a different application or service, even in the same data center. The layers with high frequency of change are good candidates for the N1 Grid system. They are places where you could do the following:

Consider automating repetitive changes to eliminate errors or to speed up deployment
Collect exact copies of useful builds for later retrieval (for example, sometimes it takes longer to implement a test environment from three months ago than it does to actually run a new test in the environment)
Have solid and tested deployment and removal procedures as a basis to automate flexing of a service in response to a load

The second key method for organization and prioritization is to leverage use cases to organize and deliver mobility and density. The classic definition of a use case is that it describes a sequence of actions that are performed by a system to yield a result of value to a user. Use cases were covered in "SunTone Architecture Methodology" on page 43 and are discussed in more detail in Chapter 5. Processes definition is a necessary precursor to process automation. Use cases for service mobility and increasing resource utilization through higher density must be thought through and include technology, operational, and IT governance steps.

After you have decided to become more operationally efficient on a particular layer of the stack or location in your data center, you must define core use cases primitives (for example, add, move, or remove) that match your deployable entities to their possible hosting environments that also have the needed dependency checks and notification of progress to other entities. After these two analyses are completed, the architectural, policy, and product choices become obvious.

Dependencies and Use Cases

Increased density requires use cases with additional dependency checks during the move to or removal from an operating system that already contains other running services. You should extend the "allowed with" and "not allowed with" service attributes to include visibility into and control over the possible production environment service combinations, and you should consider extending the binary yes/no attributes to a spectrum that contains the notion of required, recommended, discouraged, and prohibited coexistence. Examples of issues to protect against include installation of web servers on servers that are already running databases (causing tuning conflicts and potential security conflicts), cases where there are conflicting platform security requirements such as operating system minimization and hardening, licensing issues, or clustering complexity for a simple script that is about to be provisioned on a server that is part of a running cluster. For each element to be moved and packed together in the compute, network, or storage resource, remember to consider the dependencies for elements in each layer of the cube, the move's effect on the business service and business rules as a whole, and the move's effect on the people and processes that support services.

Service removal use cases present dependency checking opportunities to examine the impact of the cleanup of routing tables, memory footprint information on the other services running in that environment, or the appropriate notification for other services that might depend on the service about to be removed.

Resource Management

Mobility requires extending the method of assigning resource management shares (the fraction of operating system CPU, memory, or I/O allocated) for the case in which an unknown final number of services might run on a machine, leading to situationally dependent fractions being given to a particular project.

As an example, consider the case in which it is possible to provision a service component onto one of two servers. Server 1 has four CPUs and hosts two running service components, each assigned 50 CPU shares, and Server 2 has eight CPUs and hosts three service components, two with 15 shares and one with 30 shares. The new service component requires a minimum of two CPUs to run effectively, which results in very different calculations depending on the chosen server (two CPUs on Server 1 = 100 shares and two CPUs on Server 2 = 20). You will need a means to store the running tally of shares assigned in each server container and policies for the service's minimum resource requirements so that provisioning of additional service components on Server 1 or 2 does not starve the original service component of its two CPUs. This possibility enables new calculations to be made that renormalize assigned shares as required after every addition or removal of a service that is in a resource management project.

Capacity and Performance Measurements

The ability to predict and control capacity and performance becomes more important as service density increases. If you are not using Solaris™ Resource Manager or Solaris™ 10 Zones, one service's tuning or growth needs can affect other services. You must rely on planned capacity test procedures to measure and describe applications and services before promotion to production. The service testing must be performed in stand-alone mode to measure pure capacity and performance characteristics, as well as in an environment with other applications and services so that you can study issues that might arise from mixing a particular set of services. The adoption of rapid provisioning with the N1 Grid SPS or N1 Grid PS enables the rapid set up and tear down of these testing environments or easier movement between the development, test, and production environments as part of the service production life cycle.