14.1 Keeping data on the mainframe

In this section, we introduce the devices that can be used by Linux on the mainframe. Then, we discuss some technical aspects of keeping data on the mainframe. This technical information can help you decide what to bring to Linux on the mainframe. The sections on the data management tasks that follow build on these technical concepts.

14.1.1 Which devices are there for you to use?

Because Linux on the mainframe supports both traditional mainframe devices and Open Standard storage devices, you have much freedom to choose where to place your data.

Traditional mainframe devices

The basic functions of mainframe devices tend to be covered by generic device drivers. However, because manufacturers generally treat their device interfaces as proprietary, special device features and full error recovery are usually not supported by generic drivers. To close this gap, some manufacturers provide device specific drivers as Object Code Only (OCO) modules.

With the exception of some older devices, mainframe shops can continue to use the devices they already have with Linux.

Disk devices: Table 14-1 shows some of the disk devices (DASD, in mainframe language) you can use with Linux on the mainframe (the '**' signifies any specification):

Table 14-1. Supported IBM disk devices
Control unit type/model	Device type/model	Restrictions
3990(2105)/**	3380/**
3990(2105)/**	3390/**
9343/**	9345/**	Basic error recovery only
6310/**	9336/**	VM virtual disk in storage only
3880/**	3370/**	Basic error recovery only

Also, Linux on the mainframe supports more recent devices such as the IBM TotalStorage Enterprise Storage Server (ESS).^[30]

^[30] IBM TotalStorage Enterprise Storage Server is also known as "Shark."

Because operating systems and their file systems tend to have unique data formats, Linux has its own special formatting of zSeries disks. As of the writing of this book, there is no common data format for Linux and mainframe operating systems such as z/OS or z/VM that allows both operating systems to read or write common files.

There is a disk format (Compatible Disk Layout) that allows Linux file systems to be managed by zSeries storage management products, for example, DFSMS (Data Facility Storage Management Subsystem) for backup and restore, or GDPS (Geographically Dispersed Parallel Sysplex) for disaster recovery. However, this disk format does not allow Linux data to be processed from operating systems other than Linux.

Tape drives: You can use tape drives that are compatible with IBM 3480, 3490, and 3590 magnetic tape subsystems.

FCP attached devices

Beginning in late 2002, Open Systems devices can be attached to a zSeries machine through a Fibre Channel Protocol (FCP) link. The generic SCSI device drivers provide basic support for most devices such as tape drives and libraries, or fiber-attached SCSI disks. However, the generic drivers are subject to restrictions with regard to functionality and reliability. Such restrictions can be overcome by using vendor supplied device drivers (for example, "ibmtape" provided by IBM for tape devices).

The immense test effort involved in confirming the scores of SCSI-compliant devices that this FCP attachment opens up to Linux on the mainframe is the limiting factor for "official" support. Many devices are known to run, but have not been tested exhaustively. On the other hand, there is also a small number of devices with limited SCSI compliance that are not suitable for Linux on the mainframe.

At the time of writing, we are anticipating an increasing number of "official" support statements by IBM and other vendors to be issued as qualification tests are completed for specific devices.

SAN

You can also attach a Storage Area Network (SAN) through an FCP link. However, operating system images, even on the mainframe, inherit the open security and management issues that SANs suffer on other platforms.

SANs are considered enterprise-ready in environments where a single operating system controls the entire SAN. As of the writing of this book, SANs are still subject to data isolation problems in heterogeneous environments and for usage across security domains. Intense work is under way to address these problems. In anticipation of a breakthrough on these management issues, Linux on the mainframe already supports SANs.

14.1.2 Data sharing

Here, we are not using sharing to mean a particular technical implementation, but rather to describe the logical instances in which multiple users need access to a common set of data.

With Linux on the mainframe, it is likely that you are not dealing with a single Linux image and its data, but with data that are needed by multiple Linux images and maybe even a z/OS image. This sharing can be through NFS, by getting a point-in-time copy of the file of interest, or by making a program call to another application for the required data. It is not only possible but sometimes also desirable to keep data on z/OS. The mainframe provides the communication methods and there are numerous connectors (see 19.3, "Connectors to back-end systems") that facilitate the use of data in z/OS resources from Linux on the mainframe.

Sharing data raises questions about who owns which data and who is responsible for managing them. Data could be owned and controlled by an individual user, but also by a Linux image or by an application.

Where data are created by an application, it is usually the application that owns them. Especially where interrelated data are created and used in multiple locations, the application is best suited to determine what constitutes a consistent set of data (for example, for backup purposes). If the Linux image shares the use of data with a z/OS image, the data are probably directly attached to z/OS by a channel path so that you can exploit z/OS data management capabilities.

For data management, it is important that all parties are aware of who owns the data, who uses the data, and who is responsible for which aspects of their management.

Data sharing reduces the amount of required storage, both for the original data and for backup copies. It can, however, introduce latency for getting data from the owner, which might lead to unacceptable response times.

Data sharing need not be the best solution for providing the same data to multiple applications or users. Where sharing is advantageous, Linux on the mainframe can exploit the mainframe technology with its fast internal communication methods for accessing the data of other operating systems on the same machine. The next section is about the opposite of sharing: privacy.

14.1.3 Data isolation in a virtualized environment

Multiple operating systems have been running on the mainframe long before the emergence of Linux and Linux on the mainframe. Consequently, the question of isolating data and protecting each operating system's data from other operating systems is not a new concern. It has long been addressed and robust isolation mechanisms are at hand.

Channel paths

To understand data isolation and access control on the mainframe, we need to know a few things about channel paths. Channel paths are the physical access paths with their associated logic and controls that the mainframe machines use to access data (see Figure 2-13). The zSeries architecture supports 256 channel paths.

The number of devices you can attach to a channel path depends on the path type. In total, a maximum of 64 K devices can be attached to a single hardware machine. Even in the old days of 3 gigabyte-sized disks, you could directly attach close to 200 terabytes of data on disk. In Open System environments, comparable capacities can be provided by using a smaller number of large disks, rather than a large number of small disks, as typical for the mainframe.

You can share channel paths among operating systems. Path sharing is a necessity if you intend to run hundreds of Linux images. Path sharing is also a means to save on cabling and adapters (channel cards). Conversely, it is common practice to provide more than one channel path for accessing a particular device (multipathing). This redundancy eliminates cabling as a single point of failure and can boost performance.

Controlling access to devices

With a zSeries machine running numerous Linux images, you might ask how one image's data could possibly be kept safe from the other images. There are several lines of defense:

The hardware definitions
z/VM
The Linux images

The combination of these control mechanisms provides flexibility for addressing the individual data isolation and sharing needs of different installations.

The hardware definitions: The mainframe accesses devices by means of channel paths, that is, the work queue, the involved system assist processor, and the cabling. A channel path constitutes the hardware logic and processing required to access a device through a particular communication line (for example, a parallel link or a serial ESCON or FICON link). The hardware looks only for channel paths and devices that are defined to it in a special file, the IOCDS (I/O configuration data set). Devices must be defined in the IOCDS to be detected by the zSeries hardware. Undefined devices are inaccessible to the hardware and all software that runs on it.

To isolate data, you can dedicate a device, that is, define it in the IOCDS only to a particular LPAR. As a result, programs or operating systems that run in other LPARs cannot access the isolated data.

You also can dedicate a channel path with all of its devices to a particular LPAR. Channel path dedication is often used for allocating bandwidth to one LPAR in favor of others. For data isolation, all possible paths to a device would have to be dedicated.

z/VM: z/VM can detect the defined devices on all channel paths that are defined to its LPAR (or the machine, if z/VM runs natively). A guest can use only the devices that z/VM defines to it. These definitions can be changed dynamically from z/VM. z/VM also can split the real devices into smaller virtual devices (minidisks) and control the guest's access at the granularity of the virtual devices.

The Linux images: With root access to Linux, you have two more ways to control data access: at the device level and at the user level. Both can be suitable control mechanisms if only trusted personnel or applications have root access.

You can instruct a Linux image to look only for specific devices. As a result, Linux will not be able to detect any other devices, even if the hardware definitions and z/VM have made them available.

In addition, Linux can restrict the access rights of individual users to the data in Linux file systems.

Example: Using z/VM for access control

One widely-used method is to over-define the relatively static hardware definitions and the Linux definitions and then use the dynamic z/VM definitions as the effective control. Over-defining means defining devices that do currently not exist but may be added in the future. A newly added device that has already been defined to the hardware can be brought online without taking the machine down.

Figure 14-2 illustrates a z/VM that knows nine devices (A000 through A008) and has three Linux guests (Lx, Ly, and Lz). Figure 14-2 also shows the statements that z/VM uses to selectively make devices available to the guests.

Figure 14-2. z/VM control of storage devices

graphics/14fig02.gif

z/VM can give a guest exclusive access (ATTach) or shared access (LINK) to a device. Figure 14-3 shows the logical views of the devices that the guest systems have according to the statements of Figure 14-2.

Figure 14-3. Linux guests' view of devices

graphics/14fig03.gif

z/VM assures that each guest can detect only the devices that are defined to it.