Astronomical tables can come from many different sources, and the original descriptions are therefore very heterogeneous. Automated processing of the contents of these datasets, which is one of the Virtual Observatory (VO) applications, requires a uniform description for the catalogues (with standardized metadata).
The UCDs (Unified Content Descriptors), first developed in the ESO/CDS data mining project (Ortiz et al. 1999), are metadata describing precisely the contents of the individual fields (columns) of tables available from a data center. They have been applied to describe the content of the columns available in the different VizieR tables (Ochsenbein, Bauer & Marcout 2000).
Some tools using UCDs have been developed and are available online: http://vizier.u-strasbg.fr/UCD/.
The UCDs consist of a 4-level hierarchical structure, with approximately 1500 elements. Different branches of the tree correspond to different domains of the semantic classification (e.g., time, position, instrument).
A tool has been developed to visualize and explore the tree (Figure 1).
Clicking on a leaf gives access to:
The wide heterogeneity of the original description of astronomical data is clearly visible when making statistics on the column names and units used to represent a single physical quantity (Figure 2).
One of the most important use of UCDs is that they allow to select catalogues which exactly contain a given measurement. Instead of searching all the ``infrared'' catalogues for a K-band magnitude, all catalogues with a Johnson K magnitude can be retrieved instantly.
This selection can be done with the browser (see Figure 1). It is also possible to translate plain text into relevant UCDs. One provides one or several terms to describe in natural language the desired quantity (e.g., `proper motion'). The answer is a list of corresponding UCDs, tentatively ordered by relevance. These can be used to select the relevant catalogues.
If two fields in two tables are described by the same UCD, these fields can be compared because they contain the same quantity. Automated data conversion can then be applied if these fields are expressed in different units (Figure 3).
Because UCDs precisely describe the contents of catalogues, they can be used to find similar catalogues. Given a reference catalogue, the list of UCDs which are present in this catalogue is used as criteria to perform a search among all other catalogues: similar catalogues are those that will have many UCDs in common with the reference one.
Suggestions have been made to improve the current structure of UCDs. The evolution towards an ``atomic'' rather than hierarchical structure is studied. UCDs could be built by assembling atomic elements (principal nouns, adjectives, complementary nouns) selected among a predefined set of standard atoms. This scheme allows more flexibility in defining new UCDs, avoids dispersion of related quantities in different branches of the tree, and describes the data more completely.
Examples of combinations of atoms (compared to current UCDs):
UCDs are currently used in VizieR to describe the semantics of astronomical content. They offer new ways of selecting relevant datasets, and enable cross catalogue/archive interoperability. Owing to the wide diversity of table contents, UCDs constitute an excellent starting point for a hierarchical description of astronomy, for general data mining purposes. An improved structure relying, for example on atomic keywords, could provide building blocks for the development of astronomical ontologies.
Ortiz, P. et al. 1999, in ASP Conf. Ser., Vol. 172, Astronomical Data Analysis Software and Systems VIII, ed. David M. Mehringer, Raymond L. Plante, & Douglas A. Roberts (San Francisco: ASP), 379
Ochsenbein, F., Bauer, P. & Marcout, J. 2000, A&AS, 143, 23