The advent of large CCD mosaics is creating a challenge for archive centers. With nightly production rates in excess of 20 GB/night, it is necessary to have efficient methods to reduce the size of the data on archive media while preserving the information content.
We report here on recent improvements to the compFITS (Véran and Wright 1994) non-lossy astronomical data compression method that will allow to efficiently address this challenge. The method used in compFITS2 (see Figure 1) consists of splitting artificially the bit-planes of the (integer) pixels in two parts: one containing the bit-planes with large, noise-like pixel-to-pixel variations (the least significant bits - LSBs) and the other containing the bit-planes with reasonable entropy (the most significant bits - MSBs). The actual optimal partition is determined by analysis of a subset of the original image. Since current popular non-lossy compression programs such as compress or gzip are good at compressing the MSBs, but not the LSBs, it is then more efficient to pass to these programs only the compressible MSBs. To compress the MSBs, compFITS2 can use any compression program as a plug-in. The result is a valid FITS file -- compFITS2 stores the compressed and uncompressed data for each extension in a binary table extension, thus preserving the primary and extension headers in a readable form.
The obvious first step in testing was to verify that compFITS2 did indeed preserve content through the compression/decompression process. A simple Unix cmp of the original and decompressed files does not always work because of syntactic differences if the original header did not conform entirely to the FITS standard. This is because the cfitsio library produces conforming FITS format files, correcting any non-conforming header cards from the original file. Therefore the original and decompressed files were compared using a set of IRAF tasks:
Tests were run to characterize whether running compfits2 has introduced a time penalty or provides a significant gain over other methods. Because most compression programs do not support multi-extension FITS format, we ran the tests on a large number of CFHT 8K Camera images (2kx4k detector, 1 image per file, 16.38 MB per file). We tested the original compFITS, a variety of non-lossy compression programs ( compact, compress, gzip, bzip2) and compFITS2 using the same programs as plug-ins. The results are summarized in Table 1. Results for gzip and bzip2 are not included because they were consistently 10 times slower.
|CompFITS2 with compress||1784||16.38||7.67||46.8%||7.60||5.05|
|CompFITS2 with compact||877||16.38||7.13||43.5%||7.77||5.11|
The final tests were to determine performance on large multi-extension FITS image files. We chose to use the CFHT 12K Mosaic Camera files (2kx4k detector, 12 images per file, 196.64 MB per file). The tests compared CompFITS2 with the compress ``plug-in'' against the stand-alone version of the compress program ( compress being more of a standard than compact). On average, compFITS2 reduced the file size 29% more than compress (88 MB vs. 114 MB). On average compFITS2 was 6% slower than compress on file compression and 13% slower on file decompression. The results are summarized in Table 2.
Pence, W. D. 1992, in ASP Conf. Ser., Vol. 25, Astronomical Data Analysis Software and Systems I, ed. D. M. Worrall, C. Biemesderfer, & J. Barnes (San Francisco: ASP), 22
Pence, W. D., White, R., Greenfield, P. 2000, this volume, 551
Véran, J.-P. & Wright, J. R. 1994, in ASP Conf. Ser., Vol. 61, Astronomical Data Analysis Software and Systems III, ed. D. R. Crabtree, R. J. Hanisch, & J. Barnes (San Francisco: ASP), 519