The amount of astronomical data which can be accessed over the network is increasing at unprecedented rates, offering to the astronomer a lot of data, accessible from a lot of different servers all around the world, with a wide (should we say wild?) diversity of different formats and presentations. To answer to one of the most frequently asked question-- what does exist in the region of the sky I'm studying ?--three steps are implied: (1) locate the potentially interesting servers in a changing world; (2) find out how to code the query parameters; and (3) interpret the results.Steps 1 and 2--what Hanisch (2000) names good and better in his presentation--could be automatized by tools like AstroBrowse (McGlynn & White 1998), where the diversity of servers and ways to query them are coded in a distributed dictionary of URLs, the so-called GLU (Fernique et al. 1998). Our purpose here is now to tackle the third step: interpret the heterogeneous results coming from the numerous databases and services.
Our perspective is not just to nicely display the heterogeneous results on a web browser, but to let a program interpret the results for further processing, e.g. for cross-correlation or visualization. A typical example is the visualization of sky images superimposed with the information coming from catalogue or archive services, like Aladin or Skycat: in order to overlay symbols showing the exact location of sources found in catalogues on top of images of the sky, the application has to query many resources for catalogued data existing in the target region (steps 1 and 2), and then to find out, in the documents returned by the various servers, how the positions of the retrieved sources are coded (step 3). This means that not only the actual numbers representing the coordinates have to be found, but also that minimal details about the coordinate representation (the metadata) have to be known (units, coordinate systems, accuracy, etc.).
For the Aladin application, the solution which was adopted up to now was to express the output of the main CDS data sources, VizieR and Simbad, in a data format dedicated for this usage. The interpretation of the parameters provided by the NED database was just based on a simple analysis of the HTML text coming out of the NED server--a solution which can easily break down if the NED layout happens to change. These two interfacing methods--develop a dedicated data format or try to interpret HTML documents--are not adapted for the long run, or impose constraints on the server side for no obvious benefit.
XML (e Xtensible Markup Language) is basically a ``superset'' of HTML in which the various markup ``tags'' like TITLE my title/TITLE in a DTD ( Document Type Definition). The basic advantage of XML is that the same document can either be parsed by simple-minded programs (XML uses hierarchical structuring), or can be displayed in the new generation of browsers (via an XSL style sheet which maps the DTD tags into typographical specifications) with the same capabilities as the HTML documents. There are other potential interests, especially regarding interoperability issues: many documents, each with its own DTD, can be analyzed with generic tools. It is a developing standard, proposed in many different contexts for astronomy--see http://pioneer.gsfc.nasa.gov/public/xml/
HTML, the markup language in which the vast majority of the data servers are currently providing their results, has a frozen markup designed for a visual presentation of the documents (typically typographical specifications), forbidding therefore to have, e.g., a tag dedicated to mark the sky coordinates of the astronomical objects quoted in the document. A significant overhead is also generated by HTML documents, especially with the TABLE tag which is the natural way of presenting catalogued results.
On the other hand, FITS, also widely used in astronomy, includes a full support of tabular data. FITS tables however are used only in astronomy, and require specific tools to even read their contents; there are moreover two different formats for tables. However, if FITS cannot be included in XML documents, we took care of preserving a compatibility with FITS, in the sense that the definitions required for FITS can be included in our XML definitions.
Figure 1 is an incomplete illustration of the basic layout of an astrores document; the full set of definitions and related DTD can be accessed at http://vizier. u-strasbg.fr/doc/astrores.htx . The basic tags describe resources, each resource being made of one or several tables, a table being made of a set of fields and their corresponding values. Several formats can be used in the actual data part, the one shown in Figure 1 being the character-separated-value (CSV) which has the advantage of being very compact, and quite easy to produce from a spreadsheet utility.
A key point in the astrores definitions is that it is possible to ask for just a description of a resource in terms of tables, fields, queryable parameters, and the way to actually query it. The application receiving this description can complete the parameters described in the FIELD tags by itself, or can generate a query form to be filled. This facility is marked by the meta attribute in the RESOURCE tag. The LINK tag, which specifies how to actually submit a query or how to get more details about some incomplete result, is another important piece in the astrores definitions. It has the structure
LINK content-role="[queryhintsdoc]" [hrefaction]="..." ...
where content-role indicates the possible meanings of the link:
Interoperability between many partners from different disciplines is far from easy (see the ISAIA exercise): it means sharing common metadata, and adopting a common vocabulary. XML with its flexibility and growing usage is however a very good candidate for such developments.
A potential problem is however that, once the data can easily be used and reprocessed, their origin may completely disappear in the applications; the visibility of the original data servers, at least for minimal quality assessments, should be ensured in the various applications making use of the data.
Fernique, P., & Bonnarel, F. 2000, this volume, 71
Fernique, P., Ochsenbein, F., & Wenger, M. 1998, in ASP Conf. Ser., Vol. 145, Astronomical Data Analysis Software and Systems VII, ed. R. Albrecht, R. N. Hook, & H. A. Bushouse (San Francisco: ASP), 466
Hanisch, R. 2000, this volume, 201
McGlynn, T., & White, N. 1998, in ASP Conf. Ser., Vol. 145, Astronomical Data Analysis Software and Systems VII, ed. R. Albrecht, R. N. Hook, & H. A. Bushouse (San Francisco: ASP), 481