The current life cycle of astronomical data is lived through a bizarre parade of data formats. A probable path in this cycle could be: data are taken at the telescope as FITS, they are submitted to a publisher in LATEX , the publisher converts these data into SGML, data centers convert the SGML into some form of ASCII (such as HTML), and the end-user downloads this and converts the data back into FITS (!) for use in various analysis software.
So many format translations need not occur, since XML, the eXtensible Markup Language, could provide a reasonable interchange format to stand in at every step of the cycle. XML is designed to be data-centric (unlike HTML or SGML) with the layout/presentation of the information being provided by a separate file (often referred to as a ``style sheet''). This simple divorced situation allows us then, at each step of the data cycle, to re-use the same core XML data file. Another advantage of XML is its description and validation via a DTD (``Document Type Definition''). The DTD may be used both to hold information about the data (such as keyword definitions) and to provide a template for querying databases holding the XML documents. Default values for header descriptions may be obtained from the DTD when the document is parsed, allowing the document to hold only those keywords with non-default values. All XML documents may hold URLs that point back to their respective DTDs, insuring that the correct DTD may always be found. Finally, as a recommendation of the World Wide Web Consortium (W3C), XML has the weight of the IT community at large behind it. Currently, this Internet standard is receiving massive development support that easily outstrips any spending that astronomers invest in FITS. It makes sense to leverage this resource.
There is no mystery as to why FITS lacks many of the favorable characteristics of XML; the FITS standard was developed in the late 1970s for a different computing environment (VAX/Fortran/tape archives). While FITS has evolved, it still contains many limitations based on its origins that need not be adhered to today, including 8 character keywords, 80 character cards, a maximum of 999 table records, etc. Yet even for its limitations, FITS still contains a good understanding of the general needs of astronomy data. Because XML is designed to be the basis of other languages, it is generic in nature and will require development effort to adequately encapsulate scientific data with it.
There is no need to re-invent the wheel here. What is really needed is a marriage between XML and FITS with the goal of re-mapping (rather than redefining) the FITS standard into an XML-based format. We refer to this new hybrid data format standard to as ``FITSML'' and briefly discuss important characteristics of this project below.
The Astronomical Data Center (ADC) at Goddard Space Flight Center is in the process of developing a science data interchange language in XML. This new language, XDF (eXtensible Data Format) is formulated to contain only the most basic needs of encapsulating scientific (not just astronomical) data.
We have found the translation of basic FITS data types into XDF-based FITSML is possible without content loss or redefining important FITS definitions (keywords). This is possible because the XDF data model is fairly sophisticated and allows for many types of ASCII and binary data, unevenly distributed data cubes, grouping of data with mixed data formats, and vector spaces, among other things. Furthermore, the XDF kernel is designed to allow discipline specific keywords to be included within XDF-based data formats. This property allows FITSML to look familiar to old hands. XDF also allows for two other important properties of FITS: the addition of user-defined keywords and extension of FITSML into sub-field/mission-specific varieties of FITS.
XDF provides more than a means to XMLize FITS however. It brings in new (and we feel needed) properties that FITS currently lacks. Some of the most important of these features include the following:
<!-- some FITS keywords, but in hierarchy --> <observation> <telescope>VLA</telescope> <observer>Syke</observer> <imageType>object</imageType> <datesAndTimes> <observationDate>27/10/1982</observationDate> </datesAndTimes> <positions> <astroObject>3C405</astroObject> </positions> </observation> ... continues ... </XDF>
<!ENTITY newton '<unitGroup name="N"> <apply><times/> <meter /> <kilogram /> <apply><power/><second /><cn>-2</cn></apply> </apply> </unitGroup>' >Unit entities may also be used to create other units, such as for the unit ``Pascal'' in the example below which uses ``&newton;'' entity in its definition:
<!ENTITY pascal '<unitGroup name="Pa"> <apply><divide/> &newton; <apply><power/><meter /><cn>2</cn></apply> </apply> </unitGroup>' >With units expressed as entities the FITSML parser may easily examine and decompose unit definitions into the constituent nine basic units.
XDF and FITSML are works in progress. Although we have examined the encapsulation of standard forms of astronomical data, such as images, spectra, tables, and sky atlases, we continue to examine other types of data in order to give XDF the generality we desire, and to work with the FITS community to further develop the FITSML DTD in order to meet their needs. Towards this end, we look forward to developing a prescription for FITSML to wrap legacy FITS files and to test translation of more advanced FITS data formats into FITSML. Other plans include investigating various combinations of style sheets for viewing (and perhaps editing FITSML) within web browsers, and releasing a beta software package for XDF and an alpha software package for FITSML (both releases have software written in Java/Perl) in Spring 2001. We hope to provide a beta software package and simple translation tools between FITS and FITSML before the end of 2001.
Space limitations prevent our fully describing FITSML or disclosing the FITSML DTD or samples within this brief article. Please refer to the following web pages for more information: