Engineering, scientific, academic, and entertainment segments all demonstrate specialized needs given their increasing reliance on data that is characterized by its unstructured format. The data organizational model is an average file system and contains information used in GIS seismic applications, image and video streaming, and model simulations such as weather patterns and nuclear explosions. The files associated with these applications typically require large file sizes with multiple read I/Os from many users, both local and remote.
The additional characteristic of this data is the way it gets created. On the high end, the data may come from seismic studies and surveys where millions of digital measurements are contained within the data to simulate a geophysical structure. The measurements are contained in files, which are later used by simulation applications to visualize a projected oil deposit. Consequently, the data is larger and must be accessible to those engineers and scientists who analyze the visualization. Traditional client/server storage solutions become very expensive as storage capacities and resources beseech a general-purpose server thats beyond the necessary computer capacities required.
On the low end, data may be generated from the increasing amount of digital images used in governmental work. For example, images of land, land surveys, and improvements can provide a significant productivity improvement in many small governments . The problem is that storage of the images greatly exceeds the limitations of departmental servers employed at this level. Traditional client/server systems are very expensive, especially when purchased just to add storage capacity.
File systems continue to be the most common level of data organization for these applications. Given their straightforward access needs (no relational model is necessary to perform What if analyses on the data itself), the data is formatted to the mathematical/computational operations of the application. This characterizes a processing cycle of large data transfers followed by compute- intensive processing. Consequently, the arrival rate of large data transfers initiated by the user may appear to be random; however, further investigation of I/O activity may show processing trends defining the level of concurrency required.
User access is typically much smaller than traditional web, departmental, or general business OLTP applications, and is characterized by specialists that are processing the data. The major client change in these areas is the makeup of workstations for specialized end users with increased storage capacity at the client level. This extends the I/O content of the transactions that are executed as well as the requirements needed for file transfers between workstation and NAS storage devices.
Processing transactions that have gigabyte requirements necessitates multiple net-data paths. However, similar to the enormous database structures described in data warehouse applications, the need to physically partition the file system across storage arrays due to size requires more data paths. If we augment this requirement further with updates to the files that may need to be performed in isolation, stored or archived, the number of data paths becomes significant.
Figure 19-6 shows an example of specialized NAS configuration supporting the seismic analysis application. An interesting problem to contemplate is the operational activities involved in updating the seismic files, archived files, or remotely transmitted files (see Chapter 20 for more on integration issues).