Data Compression for NGST

The immense amount of data that the Next Generation Space Telescope (NGST) will produce and its distant orbit from Earth make it mandatory to do some amount of on-board image processing and data compression. This paper gives a summary of the performance of several lossless compression methods. We also show results of prescaling the image prior to compression using a square-root function. This imposes a slightly lossy compression, but the scaling can be adjusted so as to retain the desired number of noise bits.

1. Introduction

The Next Generation Space Telescope (NGST) will produce about 600 GB/day, assuming we use the NASA Yardstick 8k x 8k NIR camera (16 bits/pixel), save and transmit 64 non-destructive read-outs per image, and the camera is in continuous use (about 80 observations/day, 10³ s each). However, with an L2 halo orbit, the NASA NGST study estimates a downlink rate of 5.35 GB/day using X-band. Clearly the volume of data to downlink must be reduced by at least a factor of 100.

Astronomical images are noisy. This fact makes them difficult to compress by lossless compression algorithms such as Huffman, Lempel-Ziv, run-length, or arithmetic code. However, they also have the virtue of showing similar values among adjacent pixels. Techniques such as Rice's algorithm (Rice, Yeh, & Miller 1993) and derivatives (White & Becker 1998; Stiavelli & White 1997) can take advantage of this. In this paper, we present how some of these compression techniques would work with NGST images. Unfortunately, these lossless algorithms give us compression ratios that still exceed the telemetry guidelines. We have also looked into the feasibility of doing lossy compression by scaling the original image prior to the lossless compression. Under this scheme, we find substantial data reduction with a negligible effect on the data quality.

2. The Data Compression Process: Ratios

The first and more important compression ratio, 64:1, is obtained after applying a cosmic ray rejection process and fitting 64 readouts into one single image (Offenberg et al. 1999). (In fact, 65 readouts involve the cosmic ray rejection and fitting process, however, the first readout [dark frame] should be rarely downlinked.) An additional compression factor close to 3:1 is achieved by using predictive compression techniques such as the Lossless JPEG and Rice's algorithm. Dictionary-based lossless compression programs such as ``gzip'' and ``compress'' present lower compression ratios (see Table 1).

We can further reduce the data volume by using a prescaled image as input to the lossless encoder. The prescaling process, based on the square-root function, can be adjusted so as to retain as many bits of noise as desired. Similar results are obtained independently of the lossless compression technique in use, with overall compression ratios of 4:1 (keeping 4 noise bits), up to as much as 8:1 (keeping 1 noise bit).

$\begin{deluxetable}{lccccccc} \scriptsize\tablecaption{Performance Summary. } \t... ...Mean Diff.& & & &--0.079 &--0.024 &--0.078 &0.007 \nl \enddata \end{deluxetable}$
This table give the summary in time, memory and compression rates (lossless and lossy) achieved for five lossless compressors. As test input data, a simulated NGST deep image was used (1024 x 1024, 2 bytes per pixel, DN units) obtained in a 10³s exposure after the cosmic ray removal and slope fitting algorithm. A readout noise of 15 electrons and a gain of 4 are assumed. Scaling function applied for lossy compression, $N_B\sqrt{G}\sqrt{D+Y'}$ . Values of the Normalized Root-Mean-Square Error and Mean Difference are also shown. Rcomp and uses are implementations of the Rice algorithm. Rcomp was developed at STScI by Rick White. Uses was developed at University of New Mexico Microelectronics Research Center. LLjpeg is a lossless JPEG developed by the PVRG-JPEG. gzip and compress are the well known general purpose compression programs based on dictionary techniques. The tests were run on a Sun Ultra 10.

3. Lossy Compression: Square-Root Prescaling

Pure noise by its very nature is impossible to compress. In order to compress the data and retain as much information as possible, it is useful to eliminate the low order data bits (i.e., the noise). Truncating the bits at some level is one possibility, but the noise level differs across the picture. We describe a simple approximation to the noise which is carried in the data itself.

There are two important sources of non-systematic noise: the Poisson distribution of the photons themselves and the readout noise in the readout electronics. From these two sources, a reasonable approximation to the standard deviation is $\sigma = \sqrt{S+R}$ where S is the signal in units of photons (or equivalently electrons) and R is the readout variance in the same units.

In order to simplify both the encoding and the decoding we make an approximation to $\sigma \simeq \sqrt{S+Y}$ where S is the readout (in electrons) and Y is a number which we will derive from the readout variance R. First note an absolute offset does not incur a significant penalty in data transmission since virtually any lossless data compression scheme will use very little bandwidth to transmit the offset in a large block of data. The key then is to get a good approximation to the derivative. Note if we use $\sqrt{S+Y}$ for $S/\sqrt{S+R}$ for large S the derivatives already match. By taking the derivatives with respect to S and setting them equal to each other at S = 0 we have Y = R/4. Thus we can use $\sqrt{S+R/4}$ and truncation as a lossy compression to reduce the data dynamic range in an approximately uniform way. Since our NGST simulation is in data numbers (DN), we must rescale this formula to $\sqrt{G}\sqrt{D+Y'}$ , where G is the gain, D is the signal in units of DN and Y'= R/4G. Therefore, assuming a 15 electrons readout noise and a gain of 4, Y'= 14. (Note R = 15²).

The remaining issue is where to do the truncation. We multiply by a value N_B and round to the nearest integer. The value N_B is then the number of bits into the noise that we save. Setting N_B to 1 has the effect of severely restricting the noise at each pixel but it also may affect averages and fits that attempt to pull a signal out of the noise. Larger values of N_B allow the compressed data to more closely match the original values, but at less effective compression ratios (see RMS Error in Table 1). We suggest using $N_B \sim 2$ , a value that gets us close to but under the $1/\sqrt{12}$ that we could expect from the digitization noise. Figure 1 shows how this lossy scheme does not introduce any pattern or systematic error into the residual images. The small bias shown is due to the digitization process. If this were a problem in practise, we could add a random bias which will (on average) remove it.

**Figure 1:** A bright galaxy from a simulated NGST long exposure (*left*). Residual plots varying the number of bits into the noise (*right*).
$\begin{figure} \plotfiddle{nieto-santistebanma2.eps}{1.5in}{0}{47}{47}{-175}{-35} \plotfiddle{nieto-santistebanma1.eps}{0in}{0}{32}{32}{-50}{-5} \end{figure}$

4. Compression Benefits from Cosmic Ray Removal

For the the maximum exposure time we have adopted, 10³ s, it is expected that $10\%$ of the image is affected by cosmic rays hits (Stockman et al. 1998). Most of the cosmic rays produce high values in one single pixel unrelated to other values in the vicinity. All tested compression programs benefit from prior cosmic ray removal. The Rice implementation used in this paper utilizes a linear first-order unit-delay predictor whose output is equal to the difference between the input data value and the preceding data value (CCSDS 1997). This algorithm particularly benefits from CR removal.

5. Conclusions

Acknowledgments

References

Consultative Committee for Space Data System 1997, Lossless Data Compression, (CCSDS120.0-G-1 Green Book), (Washington: NASA)

Offenberg, J. D., Sengupta, R., Fixsen, D. J., Stockman, P., Nieto-Santisteban, M., Stallcup, S., Hanisch, R., & Mather, J. C. 1999, this volume, 141

Rice, R. F., Yeh, P.-S., & Miller, W. H. 1993, in Proc. of the 9th AIAA Computing in Aerospace Conf., (AIAA-93-4541-CP), American Institute of Aeronautics and Astronautics

Stiavelli, M. & White, R. L. 1997, STScI Instrument Science Report, (STScI Publ. ACS-97-02), (Baltimore: STScI)

Stockman, H. S., Fixsen, D. J., Hanisch, R. J., Mather, J. C., Nieto-Santisteban, M. A., Offenberg, J. D., Sengupta, R. & Stallcup, S. 1998, astro-ph/9808051

White, R. L. & Becker, I. 1998, in SPIE Proc., Vol. 3356, Space Telescopes and Instruments V, ed. P. Y. Bely & J. B. Breckinridge, (Bellingham: SPIE), 823