The goals driving the development of an On-the-Fly Calibration system at the Space Telescope Science Institute (STScI) are covered elsewhere (Lubow & Pollizzi 1999). This paper will focus on the technical details of the OTFC pipeline design.
The WFPC-2 was chosen as the first instrument for implementation due to its maturity and its stable calibration software base. Other HST instruments will be added to the system in the future, but WFPC-2 was selected first so that the OTFC support software could be developed with the fewest instrument-driven complications.
The OPUS architecture, described in detail in other papers was designed and developed by the Data Processing Team at STScI. OPUS provides the framework and tools for building and operating a data reduction system that can spread processing across multiple nodes in a distributed cluster of machines. It supports parallel processing using the blackboard paradigm of interprocess communication. In addition to processing HST data at STScI, other missions are now using the OPUS software for their pipelines, including FUSE, Integral, and in the future, AXAF and SIRTF. OPUS can address the needs of a variety of missions since it is already written and released (Swade & Rose 1999), is consistently maintained by STScI, and is easily adapted to different requirements.
The decision to use OPUS for the OTFC pipeline was driven by several main points. The existing, pre-archive calibration pipeline that transforms WFPC-2 telemetry from HST into calibrated datasets for the HST Archive uses OPUS, and has for almost three years. The system has been robust, and provides a graphical user interface for the management and monitoring of the data processing stream, which is very useful in environments with a large data volume.
Another main driver was the concern for system scalability. The parallel, multi-processing capabilities of OPUS provide room for scaling the hardware architecture to meet the demands of the pipeline. Providing a user-friendly service like on-the-fly calibration will virtually guarantee that requests for recalibrated archive data increase. OPUS allows a system designer to add processors and disks with minimal reconfiguration to the system, resulting in a true, scalable architecture. Concerns about the performance of an OTFC system dictate that scalability be addressed, since unacceptable performance could reduce usage.
The OTFC WFPC-2 pipeline was targeted for a relatively short development timescale (less than 1 year). The short timescale accentuated the benefits of code re-use that would be gained by using OPUS, along with existing STSDAS (Space Telescope Science Data Analysis System) tasks. Under OPUS, it is easy to string together a series of programs, shell scripts, and tools, to form a pipeline. Each stage in the OTFC pipeline performs a specific function, and the separation of work allows for greater parallel processing. Each of these pipeline tasks was implemented in a UNIX shell script. Work is accomplished using standard UNIX system commands, OPUS software tools, and STSDAS tools. The pipeline tasks are:
The pipeline consists of 9 shell scripts totaling 553 lines of code, with much reuse between modules in the areas of setup and error handling. The existence of generic OPUS software tools and STSDAS tools made this possible. The STSDAS tools perform the bulk of the processing, while the OPUS tools facilitate passing the datasets through the OTFC pipeline. A working prototype of the pipeline was up and running within two weeks and evolved into the production system.
Automated error detection was added to the pipeline in the form of three short UNIX shell scripts. These scripts are reconfigured for each pipeline stage using the resource file feature of OPUS (explained below). They detect errors in processing and send along information to DADS so that the end-user will be informed of a delay in receiving part of their archive retrieval.
The entire OTFC pipeline could be constructed as a series of shell scripts WITHOUT using OPUS, but the benefits of OPUS are numerous. Each pipeline stage can obtain values from environment variables passed in by OPUS from an ASCII setup file, called a resource file. These resource files facilitate changing task parameters without changing code, and allow the same shell script to be used for processing different kinds of data. Parameters including disk directories, file masks, and other instrument-specific information are passed into the script, so that different versions of the script or large IF-THEN-ELSE blocks are avoided.
OPUS provides distributed, parallel, multi-processing capability, so that any stage of a pipeline can be run on any node in the cluster, as long as each node can see the needed data disks. Multiple copies of processes can be running on a single node, or spread across multiple nodes. By processing observations in parallel, the machines in the cluster are more likely to be under a balanced load (CPU and I/O), improving their throughput efficiency. Experiments and experience at STScI have shown that this parallelism increases the throughput of the system when processing a set of observations. Processing of any single observation takes a bit longer, but the staggered pipeline design (much like those used in pipelined CPU chip architectures) provides better throughput, since processing of different observations in different pipeline stages can overlap. For OTFC, the calibration stage (CALWF2) is the bottleneck in the pipeline. Running 3-5 copies of CALWF2 per processor has demonstrated better throughput for the system. Of course, there is a point of diminishing returns, where adding more copies of processes result in more swapping behavior than additional useful work. Experimentation provides the best measure for this threshold where performance starts to degrade.
OPUS multi-processing also allows built-in redundancy, so that if a dataset manages to crash one of the OTFC processes, other copies of the same process are running that can continue to operate on new datasets. This redundancy also applies to a full system crash. If hardware problems bring down one of the processors in a cluster, processes distributed to other machines through OPUS can continue to run and maintain a level of system throughput.
At this time, the OTFC WFPC-2 pipeline is in formal testing and will be fielded for in-house retrievals within STScI in January 1999. After a period of satisfactory in-house testing, the WFPC-2 instrument team at STScI will decide when it is appropriate to open up WFPC-2 OTFC retrievals to the full scientific community.
The next phase of the project will be to integrate the recalibration of Space Telescope Imaging Spectrograph (STIS) data into OTFC. The reuse from the WFPC-2 pipeline code achieved in this integration effort is expected to be greater than 90 percent.
Lubow, S. & Pollizzi, J. 1999, this volume, 187
Swade, D. A. & Rose, J. F. 1999, this volume, 111