For applications which process large amounts of data, the cost of I/O can be a major and in some cases dominant factor in the processing time. During our parallelization and optimization effort within the AIPS++ group we have processed large datasets; the size of the input visibility datasets have been about 1 GB and the output images (for each cube) more than 50 GB. The time to process such a dataset is about 10 days on an SGI Origin 2000 with 128 processors. At NCSA acquiring access to a machine with 128 processors for 10 days would be difficult, moreover even if the dedicated time could be obtained, it is unlikely that the experiment could only be carried out more than once. Parallelization efforts within AIPS++ are continuing; simultaneously, our group is collaborating with the Pablo group at the University of Illinois to characterize the I/O of our parallel imager application, pimager. This is done to understand the I/O needs intrinsic to our application, as well as those forced upon us by the I/O systems themselves. We will then use the results of our I/O analysis to develop and test solutions to our identified I/O bottlenecks, primarily parallel I/O.
The Pablo academic research group is a part of the Department of Computer Science at the University of Illinois at Urbana-Champaign. Members of the group investigate the interaction of architecture, system software, and applications on large-scale parallel and distributed computer systems. The Pablo group is a component of the NCSA enabling technologies team responsible for distributed computing. Key research foci are exploration of the following:
Increasing effective I/O data rates will require at least two actions. First, one must understand the I/O needs intrinsic to current applications, versus those forced upon application developers by the limitations of current systems. Second, these data must be used to develop and test flexible file caching and prefetching algorithms that can adapt to application demands.
The Pablo research group has developed a variety of software tools for performance analysis and optimization of parallel and distributed systems. The resulting software is intended primarily for academic and government research sites.
The I/O Analysis component of the Pablo Performance Analysis Environment contains programs to produce reports summarizing the I/O activity of an application from the I/O event records in an Self-Defining Data Format (SDDF) trace file generated by the Pablo Trace Library I/O Extension.
Inclusion of I/O instrumentation essentially replaces the UNIX I/O calls with analogous calls to the Pablo I/O library. The Pablo I/O library is a special case of the more general Pablo trace capture library. The purpose of the Pablo trace capture library is to provide routines that enable the capture of trace information for a number of events in a parallel program written in C, C++, or FORTRAN. The items that can be traced include message-passing calls, procedure and function calls, user-defined events, and most recently, input/output events. The data gathered for each event of interest includes the following:
The trace output is written to the Pablo (in SDDF format). For programs written in C and C++, instrumentation may be done manually or automatically. Both manual and automatic instrumentation require that a header file ( IOTrace.h) be included and calls be made to procedures that initialize and end tracing. A large number of common I/O events may be automatically traced by Pablo, such as open, close, read, write, get, put, and seek events. A number of machine-dependent events are also included. For example, global opens, I/O mode events, file size directives, and synchronous and asynchronous I/O are supported for the Intel Paragon. For formatted I/O, such as fprintf and fscanf, manual instrumentation is necessary. This is easily accomplished by bracketing each event of interest with calls to the Pablo trace library.
The AIPS++ library has been instrumented and if the Pablo library is installed, the AIPS++ code can be linked against it to include the instrumentation. The user can set environment variables that specify which applications are instrumented and where the trace files are written.
Our I/O tests were carried out on a new 128-processor SGI Origin 2000 at NCSA. Each processor is a 250 MHz R10000 with 4 MB of L2 cache with a peak speed of 500 MFLOPS. The I/O system is a fiber-channel JBOD disk system. We had the machine in a dedicated run for 10 days.
Dave Westpfahl (New Mexico Tech), supplied us with four pointings of VLA HI calibrated visibility data of M33 (B, C, and D arrays), which were used to create the images. Table 1 shows the imaging parameters (i.e., image size and channel numbers) of the AIPS++ I/O experiments carried out.
By analyzing the output of the I/O experiments we have concluded the following about I/O in the AIPS++ system:
One of the most important results of these experiments was the identification of characteristics of I/O to temporary files. For large images, I/O to the temporary lattice file ( TempLattice) was 250 GB for both reads and writes for a total of 500 GB. The disk I/O rate for the temporary lattice file (100 MB/sec) was somewhat lower than for some files which were twice that rate. Figure 1 shows the the I/O ( reads and writes) to the temporary file throughout a large imaging run. The figure shows that the I/O was comprised of writing and immediately reading the same data to/from the temporary lattice file. The relatively low rate of I/O to temporary files was likely due to large frequency and small size of I/O requests, see Figure 1.