Next: Time Critical Analysis by Image Subtraction
Up: Data and Image Processing
Previous: Compression of Mosaic CCD Images with CompFITS2
Table of Contents -
Subject Index -
Author Index -
PS reprint -
Pence, W., White, R. L., Greenfield, P., & Tody, D. 2000, in ASP Conf. Ser., Vol. 216, Astronomical Data
Analysis Software and Systems IX, eds. N. Manset, C. Veillet, D. Crabtree (San Francisco: ASP), 551
A FITS Image Compression Proposal
W. Pence
NASA Goddard Space Flight Center
R. L. White, P. Greenfield
Space Telescope Science Institute
D. Tody
IRAF Group, National Optical Astronomy Observatories
Abstract:
We have developed a general technique for storing compressed images in
FITS binary tables. The image is first divided into one or more
rectangular sub-images or tiles, then each tile is compressed and the
resulting byte stream is stored in a variable length row of a binary
table. By dividing the image into tiles it is possible to extract and
uncompress subsections of the image without the expense of
uncompressing the whole image. Several commonly used algorithms for
compressing the image tiles will be supported initially, and in
principle, support for any other compression algorithm may be added
later. We are in the process of making trial implementations of this
technique within the IRAF image kernel and within the CFITSIO
subroutine library for accessing FITS files. Once completed, these
implementations will allow application programs to transparently read
(and perhaps write) compressed images without needing any knowledge
about the compression algorithm.
With the development of larger and larger imaging detectors there is a
growing need for a data format that will allow the images to be stored
and directly used in a compressed format. To this end we have
developed a general technique for compressing FITS images based on the
scheme first proposed by White & Greenfield (1999) that has a number
of advantages over simply using gzip or UNIX compress on
the entire FITS file:
- Only the image data are compressed; the FITS header and any other
extensions in the file remain uncompressed and hence can be read by
FITS file browsers without having to first uncompress the whole file;
- FITS keywords are used to fully and transparently document how the image has
been compressed;
- Any number of different compression algorithms can be supported; some
algorithms are potentially much more effective for compressing
astronomical images (especially floating point images) than the
commonly used gzip or UNIX compress algorithms;
- Optionally, the image may be divided into a rectangular grid of
sub-images (tiles) which are each compressed separately. Reading programs
may then randomly access sub-images without having to
uncompress the entire image;
- The compressed format is efficient both in terms of disk space and read
access time and so is suitable as a run time data analysis format. The
additional CPU time needed to uncompress the image is offset by the
reduced disk I/O times since significantly fewer bytes need to be read
from disk.
In the following sections we describe the current prototype
implementation of this compression technique. Some of the details may
change as we gain more experience, so readers should consult the latest
on-line version of the format description (available from any of the
authors) for the precise details of the format.
The general principle used in this convention is to first divide the
n-dimensional image into a rectangular grid of sub-images or ``tiles''.
Each tile is then compressed as a continuous block of data, and the
resulting compressed byte stream is stored in a row of a variable
length column in a FITS binary table. By dividing the image into tiles
it is generally possible to extract and uncompress subsections of the
image without having to uncompress the whole image. The default
tiling pattern treats each row of a 2-dimensional image (or higher
dimensional cube) as a tile, such that each tile contains NAXIS1
pixels. Any other rectangular tiling pattern (including treating the
whole image as a single tile) may be defined using the
ZTILEn keywords that are described below.
The following keywords are defined to describe the structure
of the compressed image:
- ZIMAGE (required keyword) This keyword must have the
logical value T. It indicates that the FITS binary table extension
contains a compressed image, and that logically this extension should
be interpreted as an image and not as a table.
- ZCMPTYPE (required keyword) The value shall contain a
character string giving the name and version of the algorithm that must
be used to decompress the image. Currently, values of GZIP_1,
RICE_1, PLIO_1, and HCOMPRESS_1 are reserved to
refer to several commonly used algorithms ( PLIO stands for the
IRAF Pixel List compression algorithm). We intend to provide a
detailed description of how to uncompress each of these formats in the
final version of the document.
- ZBITPIX (required keyword) The value
shall contain an integer that gives the value of the BITPIX keyword in
the uncompressed FITS image.
- ZNAXIS (required keyword) The value
shall contain an integer that gives the value of the NAXIS keyword in
the uncompressed FITS image.
- ZNAXISn (required keywords) The value
shall contain a positive integer that gives the value of the NAXISn
keyword in the uncompressed FITS image.
- ZTILEn (optional keywords) The value of these indexed
keywords (where n ranges from 1 to ZNAXIS) shall contain a
positive integer representing the number of pixels along axis n
of the compression tiles. All the pixels within each tile are
compressed as a contiguous data array and stored in a row of a
variable-length vector column in the binary table. The size of each
image dimension (given by ZNAXISn) is not required to be an integer
multiple of ZTILEn, and if it is not, then the last tile
along that dimension of the image will contain fewer image pixels than
the other tiles. If the ZTILEn keywords are not present then the
default 'row by row' tiling will be assumed such that ZTILE1 = ZNAXIS1,
and the value of all the other ZTILEn keywords equals 1.
The compressed image tiles are stored in the binary table in the same
order that the first pixel in each tile appears in the FITS image.
- ZNAMEn and ZVALn (optional keywords) These pairs of
optional array keywords (where n is an integer index number starting
with 1) supply the name and value, respectively, of any
algorithm-specific parameters that are needed to compress or uncompress
the image. The value of ZVALn may have any valid FITS data type.
The order of the compression parameters may be significant, and may be
defined as part of the description of the specific decompression
algorithm.
- Other Keywords The binary table header may contain any
additional keywords to provide information about the image. In
general, all the keywords in the header of the FITS image will be
copied verbatim into the header of the compressed binary table and
these keywords will have the same meaning in the binary table as they
did in the image. The mandatory BITPIX, NAXIS, and NAXISn
keywords are not copied and are instead replaced by the ZBITPIX,
ZNAXIS, and ZNAXISn keywords as described above.
The following columns in the FITS binary table are defined by this
convention. The order of the columns in the table is not significant.
The column names (given by the TTYPEn keyword) are shown here in
upper case letters, but the case is not significant. Any number of
other columns besides those defined here may be present in the table to
supply other parameters that relate to each image tile.
- COMPRESSED_DATA (required column) Each row of this
variable-length column contains the byte stream that was generated as a
result of compressing the corresponding image tile. The data type of
the column (as given by the TFORMn keyword) will generally be either
'1PB', '1PI', or '1PJ', depending on whether the compression algorithm
generates an output stream of 8-bit bytes, 16-bit integers, or 32-bit
integers, respectively. If it is not possible to efficiently compress
a particular image tile, then the COMPRESSED_DATA vector in the
corresponding row will have a length of zero, and the uncompressed tile
pixels will be written instead to the UNCOMPRESSED_DATA column
described below.
- UNCOMPRESSED_DATA (optional column) This variable length
column will contain the uncompressed pixels for any tiles that cannot
be compressed. The data type of this column will usually correspond to
the data type of the original image. If all the tiles in an image are
compressed, then the UNCOMPRESSED_DATA column is not required.
- ZSCALE and ZZERO (optional columns) These columns give the
linear scale factor and zero point offset which may be needed to
transform the raw uncompressed values back to the original image pixel
values (or at least a close approximation to the original values) using
the following formula:
image_pixel_value = uncompressed_value * ZSCALE + ZZERO
ZSCALE and ZZERO generally have double precision values and
have default values of 1.0 and 0.0, respectively. If the same values of
ZSCALE and ZZERO apply to every tile in the image, then they may be
given as header keywords rather than as table columns.
ZSCALE and ZZERO are typically used to scale floating point
images (with BITPIX = -32 or -64) into integers before
compression, since most compression algorithms are not very efficient
with floating point data.
- ZBLANK (optional column) In cases where floating point
images are converted to integers before being compressed, this column
gives the the integer value that is used to represent undefined pixels
(if any) in the image. These pixels would have an IEEE NaN (Not a
Number) value in the uncompressed floating point FITS image. If every
tile uses the same null value, then ZBLANK may be given as a keyword
instead of as a table column. If there are no undefined pixels in the
image then ZBLANK is not required.
We plan to release the prototype implementation of this compression
scheme within the IRAF kernel and the CFITSIO libraries for trial use
by other software developers and data providers. Eventually it is
hoped that this convention will be widely supported and perhaps adopted
as part of the official FITS Standard.
References
White, L. R., & Greenfield, P. 1999, in ASP Conf. Ser., Vol. 172, Astronomical Data
Analysis Software and Systems VIII, ed. D. M. Mehringer, R. L. Plante, &
D. A. Roberts
(San Francisco: ASP), 125
© Copyright 2000 Astronomical Society of the Pacific, 390 Ashton Avenue, San Francisco, California 94112, USA
Next: Time Critical Analysis by Image Subtraction
Up: Data and Image Processing
Previous: Compression of Mosaic CCD Images with CompFITS2
Table of Contents -
Subject Index -
Author Index -
PS reprint -
adass@cfht.hawaii.edu